Robust Power Management via Learning and Game Design

16
This article was downloaded by: [132.174.251.2] On: 24 December 2020, At: 13:02 Publisher: Institute for Operations Research and the Management Sciences (INFORMS) INFORMS is located in Maryland, USA Operations Research Publication details, including instructions for authors and subscription information: http://pubsonline.informs.org Robust Power Management via Learning and Game Design Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, Peter Glynn To cite this article: Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, Peter Glynn (2020) Robust Power Management via Learning and Game Design. Operations Research Published online in Articles in Advance 24 Dec 2020 . https://doi.org/10.1287/opre.2020.1996 Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and- Conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval, unless otherwise noted. For more information, contact [email protected]. The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or support of claims made of that product, publication, or service. Copyright © 2020, INFORMS Please scroll down for article—it is on subsequent pages With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.) and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to transform strategic visions and achieve better outcomes. For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Transcript of Robust Power Management via Learning and Game Design

Page 1: Robust Power Management via Learning and Game Design

This article was downloaded by: [132.174.251.2] On: 24 December 2020, At: 13:02Publisher: Institute for Operations Research and the Management Sciences (INFORMS)INFORMS is located in Maryland, USA

Operations Research

Publication details, including instructions for authors and subscription information:http://pubsonline.informs.org

Robust Power Management via Learning and Game DesignZhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, Peter Glynn

To cite this article:Zhengyuan Zhou, Panayotis Mertikopoulos, Aris L. Moustakas, Nicholas Bambos, Peter Glynn (2020) Robust Power Managementvia Learning and Game Design. Operations Research

Published online in Articles in Advance 24 Dec 2020

. https://doi.org/10.1287/opre.2020.1996

Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-Conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial useor systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisherapproval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitnessfor a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, orinclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, orsupport of claims made of that product, publication, or service.

Copyright © 2020, INFORMS

Please scroll down for article—it is on subsequent pages

With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individualprofessionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods totransform strategic visions and achieve better outcomes.For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org

Page 2: Robust Power Management via Learning and Game Design

OPERATIONS RESEARCHArticles in Advance, pp. 1–15

http://pubsonline.informs.org/journal/opre ISSN 0030-364X (print), ISSN 1526-5463 (online)

Methods

Robust Power Management via Learning and Game DesignZhengyuan Zhou,a Panayotis Mertikopoulos,b Aris L. Moustakas,c Nicholas Bambos,d Peter Glynnd

aDepartment of Technology, Operations, and Statistics, Stern School of Business, New York University, New York, New York 10012;bUniversity Grenoble Alpes, CNRS, Grenoble INP, Inria, LIG, F-38000 Grenoble, France; cDepartment of Physics, University of Athens,Athens, Greece; dDepartment of Management Science and Engineering, Stanford University, Stanford, California 94305Contact: [email protected], https://orcid.org/0000-0002-0005-9411 (ZZ); [email protected],

https://orcid.org/0000-0003-2026-9616 (PM); [email protected] (ALM); [email protected] (NB); [email protected] (PG)

Received: August 22, 2017Accepted: September 18, 2020Published Online in Articles in Advance:December 24, 2020

Subject Classifications: computer science:artificial intelligence; games/group decisionsArea of Review: Machine Learning

https://doi.org/10.1287/opre.2020.1996

Copyright: © 2020 INFORMS

Abstract. We consider the target-rate power management problem for wireless networks;and we propose two simple, distributed power management schemes that regulate powerin a provably robust manner by efficiently leveraging past information. Both schemes areobtained via a combined approach of learning and “game design” where we (1) design agamewith suitable payoff functions such that the optimal joint power profile in the originalpower management problem is the unique Nash equilibrium of the designed game; (2) de-rive distributed power management algorithms by directing the networks’ users to employa no-regret learning algorithm to maximize their individual utility over time. To establishconvergence, we focus on the well-known online eager gradient descent learning algorithmin the class of weighted stronglymonotone games. In this class of games, we show that whenplayers only have access to imperfect stochastic feedback, multiagent online eager gradientdescent converges to the unique Nash equilibrium in mean square at a O

(1T

)rate.

In the context of power management in static networks, we show that the designedgames are weighted strongly monotone if the network is feasible (i.e., when all users canconcurrently attain their target rates). This allows us to derive a geometric convergence rateto the joint optimal transmission power. More importantly, in stochastic networks wherechannel quality fluctuates over time, the designed games are also weighted stronglymonotone and the proposed algorithms converge in mean square to the joint optimaltransmission power at a O

(1T

)rate, even when the network is only feasible on average

(i.e., users may be unable to meet their requirements with positive probability). This comesin stark contrast to existing algorithms (like the seminal Foschini–Miljanic algorithm and itsvariants) that may fail to converge altogether.

Funding: P. Mertikopoulos is partially supported by the COST Action [CA16228] European Net-work for Game Theory (GAMENET) for this work, and is grateful for financial support by theFrench National Research Agency (ANR) in the framework of the starting grant ORACLESS[ANR–16–CE33–0004–01, the Investissements d’avenir program [ANR-15-IDEX-02], the LabExPERSYVAL [ANR-11-LABX-0025-01], andMIAI@GrenobleAlpes [ANR-19-P3IA-0003]. A.Moustakasis supported by the U.S. Office of Naval Research through the grant [N62909-18-1-2141]. N. Bambos ispartially supported by Koret Foundation under the Digital Living 2030 project.

Keywords: power management • wireless network • online learning • Nash equilibrium

1. IntroductionViewed abstractly, power management (or powercontrol) is a collection of techniques that allows theusers of a wireless network to achieve their performancerequirements (e.g., in terms of throughput) while mini-mizing the power consumed by their equipment. Thus,given the key part played by transmitted power in in-creasing battery life andnetwork capacity, power controlhas been a core aspect of network design ever since theearly development stages of wireless networks.

In this general context, distributed power manage-ment has proven to be the predominant power man-agement paradigm, and with good reason: centralizedcoordination is extremely difficult to achieve in large-scale wireless networks; a single point of failure in a

centralized allocator could have devastating network-wide effects; the communication overhead alone be-comes unmanageable in cellular networks; and the listgoes on (Rappaport 2001, Goldsmith 2005). Conse-quently, considerable effort has been devoted to de-signing distributed power management algorithmsthat are provably capable of attaining various per-formance guarantees required by the network’s userswhile leaving a sufficiently small footprint at thedevice level.The problem of distributed power management

becomes even more important and challenging in theemerging era of the Internet of Things (Bradley et al.2013), which paints the compelling vision (in partalready under way) of embedding uniquely identifiable

1

Page 3: Robust Power Management via Learning and Game Design

wireless devices in the world around us and connectingthese devices/sensors to the existing internet infra-structure to forman intelligent and coherently functionalentity—as in “smart cities” (Deakin 2013), patientmonitoring (Byrne and Lim 2007), and digital health(Au-Yeung et al. 2010). Thus, as the exponentiallygrowing number of wireless communicating “things”has been pushing the entire wireless ecosystem fromthe low-traffic, low-interference regime to the high-traffic, high-interference limit, the power managementquestion grows ever more urgent: how can the power ofbattery-driven (and hence energy-constrained) devicesbe regulated in a distributed, real-time manner so as toachieve the quality-of-service guarantees using mini-mum power in the presence of the inherent stochasticfluctuations of the underlying wireless network?

The current gold standard for power managementis the seminal method of Foschini andMiljanic (1993),which, owing to its elegance, simplicity and strongconvergence properties, is still widely deployed (inone variant or another). The original Foschini–Miljanic(FM) algorithm was expanded upon in a series ofsubsequent works (Mitra 1994, Yates 1996, Ulukusand Yates 1998) that considered different variants ofthe problem (e.g., working with maximum powerconstraints, allowing devices to asynchronously up-date their power, etc.). Thereafter, various other ob-jectives related to power management have beenconsidered in wireless networks (as well as in theclosely related wireline networks), resulting in moresophisticated models and more complex algorithmsaddressing issues related to throughput (El Gamalet al. 2006a, b; Reddy et al. 2008; Seferoglu et al. 2008),fairness (Eryilmaz et al. 2006), delays (Eryilmaz et al.2008, Altman et al. 2010), backlog (Gitzenis andBambos 2002, Reddy et al. 2012, Gopalan et al. 2015),and weighted-sum-of-rates (Candogan et al. 2010,Weeraddana et al. 2012).1

Importantly, most of the existing (distributed) powermanagement/control schemes tend to rely implicitlyonly on the previous power iterate when selecting thepower for the current iteration (especially when per-taining to the target-rate power management problemthat we consider here). In some respects, there is goodreason to do so: the resulting power control algorithm issimple to use and implement; it does not need to use upmemory to keep track of the entire radiated powerhistory (a scarce resource in small wireless devices); and,as a pleasant by-product, the algorithm becomes mucheasier to analyze theoretically.

Notwithstanding, a crucial andwell-knowndownsideto this approach is that such algorithms tend to be un-stable because a single power iterate is the sole powerselection criterion. In particular, despite the variousconvergence and optimality guarantees observed whenthe underlyingwireless network is static/slowly varying

(see, e.g., Chiang et al. 2008, Tan 2015, and referencestherein), this instability is acutely manifested in time-varying, stochastic networks, especially when thenumber of mobile devices is large and/or powercontrol is occasionally infeasible due to device mo-bility (a case that is much less studied and under-stood; compare (cf.) Section 2.3 for a detailed dis-cussion). As we discuss in the rest of this paper, thisinstability is in some sense the root cause for the lackof both convergence and optimality results when thenetwork is stochastic and time varying. Under thislight, the explicit use of all past iterates holds greatpromise for the overall stability of a power man-agement policy, as the influence of the last iteratecannot have a dominating effect over the algorithm’sprevious iterates (provided they are utilized in anintelligent, memory-efficient way).Our aim in this paper is to provide a distributed

power control algorithm satisfying the above desid-erata. This task faces two key challenges from apractical perspective: First, such an algorithm cannottake for granted that each transmitter has access to thepower characteristics of all other transmitters, as suchinformation is typically unavailable in practical sce-narios. This dictates that any given transmitter canonly make explicit use of link-specific information,such as its signal-to-interference plus noise ratio(SINR) and/or the total interference and noise at thereceiver (i.e., information that can be sensed by eachindividual link in the network). Second (and perhapsmore stringently), a transmitter should not be re-quired to store all past information (including pastpower, SINR, and/or interference measurements)in an explicit fashion. If met, this requirement ishighly desirable for a memory-constrained wirelesstransmitter where such bookkeeping is in generalinfeasible—in other words, past information must beexploited in an implicit, parsimonious manner.

1.1. Related Work and Our ContributionsOur contributions are three-fold, andwediscuss themin the context of existing work below.First, we propose two novel power control algo-

rithms (Variants A and B in Algorithm 2), both ofwhich require meager, O(1) operational overheadwhile leveraging past information in an efficient way.In particular, the information on past power iterates isrepresented in the most parsimonious form possible:it takes only a constant amount of memory inde-pendent of the number of time steps involved. In fact,the amount of memory required is the same as that ofthe algorithms using only the last power iterates (e.g.,FM), even though the latter do notmake explicit use ofpast power information. The proposed algorithms arequite simple and lend themselves easily to beingimplemented on wireless devices.

Zhou et al.: Robust Power Management2 Operations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS

Page 4: Robust Power Management via Learning and Game Design

Second, we provide theoretic performance guar-antees that establish the robustness and optimality ofthe proposed algorithms. More precisely, in the caseof a static network environment that is feasible (i.e.,when all users can concurrently attain their targetrates), the proposed algorithms converge to the jointoptimum transmission power vector at a geometricrate O(κ−T) (for some κ > 1). More importantly, in astochastic iid network environment (where the fluc-tuating network is sometimes feasible and sometimesnot), we show that the proposed algorithms stillconverge to a deterministic power vector in meansquare at a O(1/T) rate, provided only that the net-work is feasible in the mean, that is, even if the networkis infeasible with positive probability. We find thisguarantee particularly appealing because it incor-porates elements of both stability and optimality. Theformer (stability) is because the proposed algorithmconverges to a fixed constant power vector despite thepersistent, random fluctuations in the network (and,in particular, even if power control is not feasible for agiven channel realization). The latter (optimality) isbecause the algorithms’ end state is an optimal solu-tion to the network’s power management problemwith respect to the channels’ mean value statistics.This comes in sharp contrast to the FM algorithm,which, when the channel is feasible on average, mayfail to converge altogether or, at best, only convergesin distribution to a power profile that is not optimal inany way (Zhou et al. 2016c).

We obtain these theoretic guarantees via a com-bined approach of learning and “game design”wherewedesign a gamewith suitable reward functions suchthat the optimal joint power profile in the originalpower management problem is the unique Nashequilibrium of the designed game.We then show thatif the network environment is feasible, the designedgame is weighted strongly monotone (WSM): this is animportant class of games where convergence to aNash equilibrium can be characterized under suitableno-regret online learning algorithms. We then es-tablish the equivalence between the proposed powercontrol algorithms and multiagent online eager gra-dient descent (EGD) as applied to the designed games:by showing that the latter converges to the uniqueNash equilibrium of a WSM game, and because theunique Nash equilibrium by design is the optimaljoint power profile, we readily obtain all the desiredresults for the proposed power control algorithms.

Importantly, establishing the algorithms’ conver-gence takes us through a distinct (albeit related) lineof research, namely, utility-based models of powermanagement in wireless networks (Alpcan et al. 2002,2006; Fan et al. 2006; Menache and Ozdaglar 2010;Han et al. 2014; Zhou and Bambos 2015; Zhou et al.2016a, b, 2018a). This flourishing literature has taken

the view that each device is a self-interested entitywith its individual objective depending on howmuchpower it uses as well as howmuch power all the otherdevices use (via the interference that they create). Inthis literature, the cost function of each device (as afunction of, say, signal-to-interference ratio) is mod-eled explicitly and pertains to the utility of each user ofthat device: in other words, the resulting game is apriori assumed to model the underlying reality. Incontrast, the game design approach thatwe take in thispaper leads to a virtual game that only enters theproblem as a theoretic tool to aid the design of robustpower management algorithms (which are imple-mented at the device level and are not subject to game-theoretic rationality postulates). Nevertheless, theresulting games do admit a natural interpretation asthey effectively measure the distance between theusers’ achieved throughput and their target rates.The idea of game design has been explored before,

and our approach here is inspired by the work ofCandogan et al. (2010) and Li and Marden (2013). Inmore detail, Candogan et al. (2010) designed a near-potential game for the maximum weighted-sum-of-rates problem and used best-response dynamics toderive a power control algorithm. Li and Marden(2013) designed a potential game and used a partic-ular method (called gradient-play) to derive a distrib-uted optimization algorithm for the network con-sensus problem. In addition to considering a differentproblem altogether (power minimization subject torate constraints), ourwork here differs from the abovein several key aspects: First, our game-theoretic analysisis not limited to potential games, but it instead applies toWSM games. Second, by focusing on online gradientdescent (see below), our results can also be framed inthe context of no-regret learning in games. Finally, totackle the stochastic regime, we consider a moregeneral framework with imperfect feedback, which isonly accurate up to a bounded-variance error. Theabove elements cannot be treated with the techniquesof Candogan et al. (2010) and/or Li and Marden(2013), leading us to introduce a new set of tools.Our third contribution is to establish quantitative

Nash equilibrium convergence results for multiagentno-regret learning in the class of WSM games. Spe-cifically, we consider a general game-theoretic learningframework where each agent is endowed with a con-tinuous action set anddoes not have the knowledge of its(or any other agent’s) reward function. Instead, eachagent only has access to an imperfect gradient feed-back during the learning process.2 In this setting, wefocus on online eager gradient descent, a learningalgorithm with tight regret bounds: when the se-quence of reward functions are concave (as a functionof an agent’s own action), then up to a multiplicativeconstant related to the problem’s dimensionality, it

Zhou et al.: Robust Power ManagementOperations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS 3

Page 5: Robust Power Management via Learning and Game Design

achieves the O( T

√ ) regret bound that is min-maxoptimal; the stronger O(logT) regret bound can beachieved if the sequence of reward functions are stronglyconcave (see Shalev-Shwartz 2012, Hazan 2016).

In this context, we show that if each agent employsonline eager gradient descent to maximize their cumu-lative reward, the last iterate of the joint action profileof all agents (now a random variable as a result ofstochastic first-order feedback) converges in meansquare to the uniqueNash equilibrium at aO(1/T) rate,provided the underlying game is WSM. In comparison,much of the existing literature on no-regret game-theoretic learning (including other no-regret algo-rithms) focuses on the convergence of the time-averaged sequence of the joint action xt �∑t

k�1γkxk/∑tk�1γ

k (where γt is the step-size and xt is the jointaction at time t) in different special classes of games,such as convex potential games, zero-sum games,routing games, and so on (Cesa-Bianchi and Lugosi2006, Krichene et al. 2015, Balandat et al. 2016, Lamet al. 2016). Such time-average convergence results,although useful in their own right, are weaker thanthe corresponding last-iterate convergence results(i.e., convergence of xt) and are not fine grainedenough to characterize the actual joint evolution ofthe agents’ sequence of play.

Other related considerations date as far back as thework of Arrow and Hurwicz (1960) who considered aspecial class of multiagent games that is a general-ization of two-player zero-sum games and showedthat if each agent applies gradient descent (withoutprojection), then the continuous-time approximationof the discrete-time dynamics (modelled by a dif-ferential equation) converges to the unique Nash equi-librium of the game. However, whether the discrete-time iterates converge to Nash in that class is unclear;our convergence analysis can be seen as a partial an-swer to this question. More recently, Antipin (2002)considered a class of two-player game with a gen-eralized potential function and established that gra-dient prediction-type projection method, a central-ized algorithm the authors developed, converges toa Nash equilibrium at a geometric rate. However, thisgradient prediction-type projection method cannotbe decentralized: in each iteration, one player applies apartial gradient toupdate its actionand the secondplayerapplies its own partial gradient but using the alreadyupdatedactionvalue fromthefirst player.Consequently,this cannot be used in an online learning setting whereeach player updates its own action simultaneously.

At around the same time as Antipin (2002), Flam(2002) considered a class of multiagent games calledevolutionary stable games (a very broad class ofgames that includeWSM games as a special case) andestablished local convergence results. In particular,

they showed that if each player applies eager gradientdescent with perfect feedback, convergence to a Nashequilibrium is guaranteed if one starts in the vicinityof a Nash equilibrium. Facchinei and Pang (2007)strengthened both the assumption and the conclu-sion and gave explicit convergence rates: translatingtheir results (originally in variational inequalities)into games, they established that in strongly mono-tone games, if each player applies eager gradientdescent, then convergence to the unique Nash equilibriumis guaranteed at a geometric rate, a result we apply togive sharper performance guarantees for our designedalgorithms in the static network environment case. Cuiet al. (2008) considered a particular game in mediumaccess control and investigated a distributed algo-rithm called gradient-play (in addition to other algo-rithms, such as best response and Jacobi play) andestablished geometric convergence to the uniqueNashequilibrium of the game when each agent applies thegradient-play algorithm. However, this gradient-playalgorithm is different from gradient descent in thateach player applies a partial gradient only to part ofone’s reward function; because of this, it is unclearwhether this algorithm is no-regret if applied in anonline learning setting.Importantly, all works discussed above are con-

cerned with the case where agents receive perfectgradient feedback on their actions (some time withobservability of a nonindividual gradient). From adynamical system viewpoint, Kar et al. (2012) con-sidered the more general case where the gradient iscorrupted by a stochastic but unbiased noise andestablished that if the game is strongly monotone,then if each agent applies gradient descent with noise(without projection), almost sure convergence to theunique Nash equilibrium is guaranteed. Recently,(Zhou et al. 2017b, Mertikopoulos and Zhou 2019)established that in variationally stable games (a broadclassofgames that includes stronglymonotonegames), ifeach agent applies gradient descentwith lazy projection,then even with stochastic (unbiased) noise, almost sureconvergence to Nash is guaranteed. Zhou et al. (2017a,2018b) further considered other types of imperfectfeedback in variationally stable games that includedelay and loss and characterized the robustness of thealgorithms therein. However, all these convergenceresults are asymptotic and finite-time rates are im-possible at such generality. Further, to the best of ourknowledge, a qualitative O(1/T) convergence rate isnot known for multiagent online gradient descent(with our without projection) in WSM games. Ourwork fills in this gap in the literature with a relativelysimple proof.For convenience, we summarize this thread of

findings in Table 1.

Zhou et al.: Robust Power Management4 Operations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS

Page 6: Robust Power Management via Learning and Game Design

2. Model, Background, MotivationWe describe below the target-rate power manage-ment problem on wireless networks (Weeraddanaet al. 2012, Tan 2014). After introducing the prob-lem in Section 2.1, we discuss in Section 2.2 the well-known FM power control algorithm (Foschini andMiljanic 1993). This discussion will provide an ac-count of some of the drawbacks of the FM algorithm,both quantitative and qualitative, and will serve asthe motivation of the paper (cf. Section 2.3).

2.1. SetupConsider a wireless network of N communicationlinks, each link consisting of a transmitter and anintended receiver. Assume further that the i-th trans-mitter transmits with power pi and let p� (p1, . . . ,pN) ∈RN+ denote the joint power profile of all users (trans-mitters) in the network. In this context, the most com-monly used measure of link service quality is SINR.Intuitively, link i’s SINR depends not only on howmuch power its transmitter is employing but also onhow much power all the other transmitters are con-currently employing. Specifically, link i’s SINR,which we denote by ri(p), is given by the follow-ing ratio:

ri p( ) � Giipi∑j��i Gijpj + ηi

, (1)

where ηi is the thermal noise associated with the re-ceiver of link i and Gij ≥ 0 is the power gain betweentransmitter j and receiver i, representing the inter-ference caused to receiver i by transmitter j per unittransmission power used. We further assume through-out the paper that Gii > 0 for otherwise transmissionfor link i ismeaningless.We collect all the power gainsGij into the gain matrix G and all the thermal noisesinto the noise vector η. Note that the power gainmatrix G depends on the underlying network to-pology of the wireless links. Each link has a targetSINR threshold r∗i > 0: the minimum acceptable ser-vice quality threshold for that link.

The target-rate power management problem(Weeraddana et al. 2012, Tan 2014) then lies in findinga power assignment p, such that the following quality-of-service constraints hold:

ri p( ) ≥ r∗i ,∀i. (2)In order to find a joint transmission power p that sat-isfies the quality-of-service constraints in Equation (2),such a p must exist in the first place: the notionof wireless network channel feasibility, formalizedin the next definition, characterizes such scenarios.

Definition 1. The channel given by (G, η) is feasiblewithrespect to a target SINR vector r∗ � (r∗1 , . . . , r∗N) if thereexists a p satisfying Equation (2). The channel is oth-erwise said to be infeasible.

In their original paper, Foschini andMiljanic (1993)presented a simple, necessary, and sufficient condi-tion for deciding when a channel is feasible. To stateit, we first need a convenient and equivalent char-acterization of a wireless network channel:

Definition 2. A wireless network channel (or channelfor short) specified by (G, η) can be alternatively rep-resented by the pair (W, γ) consisting of the follow-ing components:1. The reweighted gain matrix W, where for i,

j∈ {1,2, . . . ,N}:

Wij :�0, i � j

r∗i Gij

Gii, i �� j.

{(3)

2. The reweighted noise vector γ, where

γi � r∗iηiGii

, i ∈ 1, 2, . . . ,N{ }. (4)

3. Define λmax(W) to be the spectral radius of thematrix W: λmax(W) is the eigenvalue with the largestabsolute value. As noted in Tan (2014), the reweightedgain matrix W is a nonnegative (and without loss ofgenerality, irreducible)matrix; thus by Perron-Frobeniustheorem, there is auniquepositive real eigenvalueρ∗ thathas the largest magnitude.

Theorem 1 (Tan 2014). A channel is feasible with respectto r∗ if and only if the largest eigenvalue λmax(W) of thereweighted gain matrix W satisfies λmax(W) < 1.

Fact 1. Given a feasible channel (G, η), we have λmax(W)< 1, which implies that (I −W)−1 exists and iscomponent-wise strictly positive. This further impliesthat the joint transmission power p∗ � (I −W)−1γ sat-isfies the quality-of-service constraints given inEquation (2)and it is component-wise strictly positive. Finally, p∗ isthe unique vector that satisfies the following property:

Table 1. Number of Iterations Required by the FM and EGDAlgorithms to Reach an ε-Optimal State in WirelessNetworks with Fixed (Deterministic) orFluctuating (Stochastic) Channels

Deterministic Stochastic

Foschini–Miljanic algorithm log(1/ε) ______Eager gradient descent (our paper) log(1/ε) O(1/ε)Note. In the static case, both algorithms converge at a geometric rate;in the stochastic case, the FM algorithm may fail to stabilize (even inprobability), whereas the proposed EGD policy converges withinO(1/ε) iterations.

Zhou et al.: Robust Power ManagementOperations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS 5

Page 7: Robust Power Management via Learning and Game Design

if p is any vector satisfying Equation (2), then p∗ ≤ p(component-wise).

In other words, in a feasible channel, p∗ is the“smallest” joint transmission power that satisfies thequality-of-service constraints. To highlight the im-portance of this quantity and to recognize the fact thatthe results in this paper will mostly pertain to p∗, wehave the following definition:

Definition 3. In a feasible channel (G, η) (or equiva-lently (W, γ)), p∗ defined in Fact 1 is called the optimaljoint transmission power vector.

2.2. Foschini-Miljanic Power Control AlgorithmWe now present the well-known FM power controlalgorithm, which finds the optimal joint transmissionpower if one exists (i.e., in a feasible channel). Fol-lowing the standard convention in wireless com-munications literature (Han et al. 2011), the trans-mission power pi for transmitter i is assumed to lie in acompact interval Pi � [0, pmax

i ]. Therefore, p is con-strained to lie in the feasible support set P≜

∏Ni�1

Pi �∏Ni�1[0,pmax

i ]. We shall adopt this convention forthe rest of the paper. The FM algorithm is then for-mally given in Algorithm 1:

Algorithm 1 (FM Algorithm: Bounded Power Support)1: Each link i chooses an initial power p0i ∈ [0, pmax

i ].2: for t � 0, 1, 2, . . . do3: for i � 1, . . . ,N do4: pt+1i � min(pti r∗i

ri(pt) , pmaxi )

5: end for6: end for

In the classical power control setting, the channel(G, η) is assumed to be deterministic and time-invariant,that is, (G, η) remains the same from iteration to it-eration. By amonotonicity argument, we can leveragethe results in Foschini and Miljanic (1993) to give thefollowing characterization:

Theorem 2 (Foschini and Miljanic 1993). Let the channel(G, η) be deterministic and time invariant.

• If the channel is feasible with respect to r∗ and if thepower support includes the optimal power vector (i.e.,p∗ ∈ P), then the joint power iterate pt in Algorithm 1converges to the optimal joint transmission power p∗,irrespective of the initial point p0.

• If the channel is infeasible with respect to r∗ or if thepower support does not include the optimal power vector(i.e., p∗ /∈ P), then the joint power iterate in Algorithm 1converges to pmax, irrespective of the initial point p0.

Remark 1. The original result in Foschini and Miljanic(1993) assumes there is no contraint on the maximumpower (and hence in their original setting, the jointpower iterate in Algorithm 1 would diverge to infinity

if the channel is not feasible). However, the corre-sponding results when there is a constrained feasiblepower support set (as given in Theorem 2) is almost animmediate corollary therefrom via a simple monoto-nicity argument. Because in practice, maximum poweris always bounded, and because the subsequent powermanagement literature (Tan 2014) in wireless networksdo assume bounded power, we follow this conventionin this paper too.

2.3. Motivation of the PaperAlthough FM enjoys good convergence propertiesin deterministic channels (as given in Theorem 2),it quickly loses its appeal in stochastic channels (i.e.,when (Gt, ηt)∞t�0 are stochastic processes): this canoccur when the transmitters and receivers in thewireless network are moving while communicatingwith each other, thereby causing the channel gainmatrix (and potentially the thermal noises) to fluc-tuate from iteration to iteration. This is because FM, inonly using pt to determine the next power iterate pt+1,has certain inherent instability, which is particularlymanifested when the underlying channel is stochas-tic. In stochastic channels, we useWt and γt to denotethe random reweighted gain matrix and the randomreweighted noise at iteration t, respectively. Pt de-notes the random power vector at iteration t. Moreconcisely, denoting ΠP(y) � argminx∈P ‖x−y‖ and theprojection operatorΠP(·) for P, the FM update can bewritten as:

Pt+1 � ΠP WtPt + γt( ). (5)

In a stochastic channel, the power iterates generatedby FM will be random variables and may fail toconverge altogether. Even when FM does converge, itwill at best, under uncertain conditions of the chan-nel, converge to a stationary distribution (Zhou et al.2016c) rather than a deterministic quantity. In fact,convergence to a stationary distribution is already thebest one can hope for when using FM. This revealstwo main drawbacks of FM. First, as mentionedpreviously, when operating in a stochastic channel,FM is not very stable. Second, convergence to alimiting stationary distribution is not fully desirable.Even when FM does converge to a stationary distri-bution, it is not clear what performance guaranteesare achieved by that limiting power distribution.Specifically, the stationary distribution that FM con-verges to is not optimal in any sense.The above two drawbacks lead to the problem of

designing a distributed power control algorithm thathas both stability and performance guarantees. Morespecifically, in light of the root cause of the instabilityof FM, it is natural to ask whether it is possible toincorporate all the past power iterates to synthesize a

Zhou et al.: Robust Power Management6 Operations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS

Page 8: Robust Power Management via Learning and Game Design

distributed power control scheme so as to stabilize thepower iterate more quickly, such that the resultingpower iterate converges (almost surely) to a fixedvector even in the presence of a stochastic channel? Ifso, would this vector be optimal in some (average)sense? Aswe shall see in the next section, both questionshave affirmative answers.

3. Proposed Power Control AlgorithmsIn this section,we present twonewand closely relateddistributed power control algorithms that utilize allthe power iterates in the past to achieve better sta-bility and optimality guarantees. The design of such adistributed algorithm (that uses past information)faces at least two challenges. First, such an algorithmcannot assume that each transmitter has access to thepower used by all the other transmitters, as suchcommunications is infeasible in practice.3 This dic-tates that a transmitter can only use the aggregateinformation, such as SINR and/or total interferenceand noise (i.e., information that can be sensed by eachindividual link), as opposed to the individual powers.Second, more stringently, one should not expect atransmitter to store all the past information that isavailable to it, which could include its own pasttransmission powers, past SINRs, etc. This additionalconstraint is highly desirable and necessary in prac-tice because, for a memory-constrained wireless trans-mitter, such bookkeeping is in general infeasible. In fact,the popularity of the FM algorithm stems in large partfrom the very limited memory it requires in performingeach update. Consequently, for a new algorithm to bepractically useful, an economic representation that in-corporates all such information from the past in a waycompact way is needed.

Here we propose a robust power control algorithmthat satisfies those two constraints. The proposedalgorithm has two variants, which we subsequentlylabel as Variant A and Variant B. For each variant,the past information is represented and stored inthe most economic form possible: it takes only con-stant amount of memory independent of time steps.For space concerns, we will directly consider thestochastic channel case (specializing the result intothe deterministic channel case is straightforward).Before stating the algorithm, we first present themodel of the stochastic channel within which thealgorithm operates, which is the same model as inZhou et al. (2016c):

Assumption 1. (Gt, ηt) is drawn iid from an arbitrary(discrete or continuous, bounded or unbounded) support onRN×N+ × RN+ , satisfying the following assumptions:

1. Finite mean: ∀i, j,∀t,E[Gtij] < ∞,E[ηti] < ∞.

2. Finite variance:∀i, j,∀t,Var[Gtij] < ∞,Var[ηti] < ∞.

Note that under this model, (Gtij, η

tk) can be arbi-

trarily correlated with (Gti′j′ , η

tk′ ) and Gt can be cor-

related with ηt. Algorithm 2 gives the description ofVariant A in the stochastic and time-varying channelcase. Here we use the upper case for gradient andpower to highlight the fact that all the quantities arenow random variables.

Algorithm 2 (Robust Power Control in Stochastic Channels)1: Initialize γ0 > 0.2: Each link i chooses an initial P1

i ∈ Pi.3: for t � 1, 2, . . . do4: for each link i do5: VariantA: Pt+1

i � Pti − γ0

t (GtiiP

ti − γ0

t (GtiiP

ti −

r∗i (∑j��i GtijP

tj + ηti))

6: Variant B: Pt+1i � Pt

i − γ0t (Pt

i − r∗i∑

j ��i GtijP

tj+ηti

Gtii

)7: Pt+1

i � ΠPi (Pt+1i )

8: end for9: end for

Remark 2. First, note that Pti ’s serve to fulfill the role of

keeping a compact representation that aggregates allthe past information at any given time t, therebyeliminating the need to keep track of the past Pt

i ’s.Second, as we shall make precise later, each Pt

i is theweighted average of the gradients of a certain costfunction. Consequently, Pt (the vector of all individualPti ’s) resides in the dual space of the space P of feasible

joint transmission power. In fact, line 7 shows that theprojection transforms a point in this dual space to apoint in the action space (i.e., P). Third, to performthe update in line 5 (Variant A) does not requiretransmitter i to know the transmission powers used byothers: it need only know the interference and noise as awhole as well the SINR. For Variant B, only the SINR∑

j��i GtijP

tj+ηi

GtiiP

ti

is needed, because from this SINR and Pti , the

ratio∑

j ��i GtijP

tj+ηi

Gtii

can be recovered. Consequently, therequirement to implement Variant B is even lessstringent than that of Variant A. However, as we shallsee, Variant B needs a slightly stronger assumption onthe channel gains for convergence and optimality.

Remark 3. Even thoughwemake the stochastic channelenvironment as stated in Assumption 1, Algorithm 2also applies in an arbitrary time-varying environment:an environment where the channel gains Gt

ij and thenoise ηt do not follow from a stationary stochasticprocess and can arbitrarily change. It turns out evenin that case, the designed algorithm has the inheritedthe no-regret guarantee with respect to a particularcost function. Although the arbitrary time-varying en-vironment is not the focus of this paper, this guarantee isstill desirable and will become clear once we have madethe connection to online learning on games.

Zhou et al.: Robust Power ManagementOperations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS 7

Page 9: Robust Power Management via Learning and Game Design

4. Eager Gradient Descent Learning inWeighted Strongly Monotone Games

We present the framework of EGD learning on theclass of WSM games. The results in this section serveas the foundation for the game-design approach ofpower control and enable us to subsequently establishthe theoretical guarantees of the proposed powercontrol algorithm in Algorithm 2. Wherever possible,we shall use the notation that matches the powercontrol setting to make explicit the connection betweenEGD learning on WSM games and power control.

4.1. Weighted Strongly Monotone GamesWe start with the definition that will set the stage forboth the theoretical study in this section and the practicaldesign and application for the subsequent sections.

Definition 4. A (λ, β)-weighted stronglymonotone game(hereafter referred to as (λ, β)-WSM for short) G is a tupleG � (N ,X � ∏N

i�1 X i, {ui}Ni�1), where N is the set of Nplayers {1, 2, . . . ,N}, X is the joint action space with X ibeing the action space for player i and ui : X → R is theutility function for player i, such that

1. Each X i is a compact and convex subset of Rdi

and each ui is continuous in x and continuously dif-ferentiable in xi, with ∂ui(x)

∂xicontinuous in x.

2. There exists some λ ∈ RN++ such that∑N

i�1 λi〈vi(x)−vi(y),xi − yi〉 ≤ − β

2∑N

i�1 λi‖xi − yi‖22,∀x,y ∈ X , where vi(x)≜ ∂ui(x)

∂xi.

Remark 4. The class of WSM games is a proper sub-class of diagonal strict concave games, first introducedin Rosen (1965). G is a diagonal strict concave gameif, instead of the second requirement in Definition 4,the weaker assumption

∑Ni�1 λi〈vi(x) − vi(y), xi − yi〉 ≤ 0

(with equality if and only if x � y) is imposed. Note thateven this weaker condition implies that each ui isconcave in xi. Next, a few words on notation. x−i de-notes the joint action of all players but player i. Con-sequently, the joint action x will frequently be writtenas (xi, x−i). Further, we denote v(x) � (v1(x), . . . , vN(x)).Note that, per the definition, the joint partial gradientv(x) always exists and is a continuous function onthe joint action space X . Finally, recall that x∗ ∈ X is aNash equilibrium if for each i ∈ N , ui(x∗i ,x∗−i) ≥ ui(xi,x∗−i), ∀xi ∈X i. The celebrated result by Rosen (1965) isthat there is always a unique Nash equilibrium forevery diagonally strict concave (DSC) game. Conse-quently, every (λ, β)-WSM game admits a unique Nashequilibrium.

We conclude this subsection with a simple suffi-cient condition ensuring that a game is (λ, β)-WSM. InRosen (1965), a simple sufficient condition is given toascertain whether a game is DSC. By an adaptation of

that sufficient condition, we have a sufficient condi-tion to check whether a given game is a WSM game.

Lemma 1. Given G� (N ,X �∏Ni�1X i,{ui}Ni�1),where each

ui is twice continuously differentiable. For each x ∈ X , definethe λ-weighted Hessian matrix Hλ(x) as follows:

Hλij x( ) � 1

2λi

∂vi x( )∂xj

+ 12λj

∂vj x( )∂xi

. (6)

If there exists some β > 0 such that λmax(Hλ(x)) ≤ −β,∀x∈X , then G is (λ, β)-WSM.

Remark 5. Note that by definition, the λ-weightedHessian matrix is always symmetric and hence all of itseigenvalues are real. Using the same notation as in thepreceding power control section, λmax(Hλ(x)) refers toits maximum eigenvalue.

4.2. Multiagent EGD Learning with Noisy GradientWhen players repeatedly interact with one anotherwhere the reward is given by a fixed (but possiblyunknown) stage game, it is an important question asto what learning dynamics the players would adopt.One answer provided by the online learning literatureis EGD, which enjoys the no-regret property (Shalev-Shwartz 2012) when the utility function is concave:each agent, when interacting with the environment(consisting of other agents), will achieve comparableperformance to the best fixed action in hindsight,irrespectively of what the other agents do.When eachagent adopts EGD, we obtain the multiagent learningdynamics as given in Algorithm 3.

Algorithm 3 (Multiagent Eager Gradient Descent)1: Each player i chooses an initial x1i ∈ X i.2: for t � 1, 2, . . . do3: for i � 1, . . . ,N do4: xt+1i � xti + αtvi(Xt)5: xt+1i � argminxi∈§i‖xt+1i − xi‖26: end for7: end for

Note that Xti takes a gradient step (with respect to

i-th player’s partial gradient only) from the previousaction and is then immediately projected back to thefeasible action space. This is called eager projectionas the intermediate Xt

i is immediate thrown awayafter the projection. This type of projection stands incontrast with lazy projection (Nemirovski et al. 2009),where Xt

i (that can potentially be out of the feasi-ble action space) is always kept and projected onlywhen needed.

Remark 6. The convergence of multiagent EGD toNash equilibria (and the corresponding convergencerate) in weighted strongly monotone games fol-lows straightforwardly from the literature. Following a

Zhou et al.: Robust Power Management8 Operations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS

Page 10: Robust Power Management via Learning and Game Design

classical result in variational inequality (Facchinei andPang 2007), we have a geometric convergence rate: letx∗ be the unique Nash equilibrium of a (λ, β)-WSMgame and let xt be the iterates generated fromAlgorithm 3. Then ‖x − x∗‖22 ≤ ε after O(log 1

ε) itera-tions if 0 < inft αt < 2β

L2, where L is the Lipschitz con-stant of the gradient v(x). This linear rate means thatconvergence to Nash is exponentially fast. When v(x)is not Lipschitz, convergence is still guaranteed,provided a diminishing step-size is chosen (note thatin the Lipschitz case, the step-size αt cannot decreaseto zero: this is necessary in order for contraction tokick in, which ensures the geometric convergencerate). However, in the absence of Lipschitz gradient,exponential convergence is not achievable. Never-theless, O(1T) is still achievable, which in fact followsas a special case of our subsequent result for thestochastic case in Theorem 3.

Next, we consider the case where gradient is im-perfect and noisy.

Algorithm 4 (Multiagent Noisy Eager Gradient Descent)1: Initialize γ0 > 0.2: Each player i chooses an initial X1

i ∈ X i.3: for t � 1, 2, . . . do4: for i � 1, . . . ,N do5: Xt+1

i � Xti + γ0

t vi(Xt)6: Xt+1

i � argminxi∈X i‖Xt+1

i − xi‖27: end for8: end for

In Algorithm 4, we have used the capital letters Xti

and Xti because these iterates are now random vari-

ables as a result of the noisy gradients vi. Of course,in order for convergence to be guaranteed, vi(Xt)cannot be just any noisy perturbation of the gradient.Here we make a rather standard assumption on thenoisy gradient:

Assumption 2. Let F t be the canonical filtration inducedby the (random) iterates up to time t: X0,X1, . . . ,Xt. Weassume the noisy gradients are

1. conditionally unbiased: ∀i∈N ,∀t� 0,1, . . . ,E[vi(Xt) |F t] � vi(Xt),a.s.,

2. bounded in mean square: ∀i ∈ N ,∀t � 0,1, . . . ,E[‖vi(Xt)‖2 | F t] ≤ Ξi, a.s., for some constant V > 0, where ‖ · ‖is some finite dimensional norm (note that all finite di-mensional norms are equivalently up to a multiplica-tive constant).

Remark 7. An equivalent and useful characterizationof Assumption 2 is that the noisy gradient can bedecomposed as vi(Xt) � vi(Xt) + ξt+1i , where the noise(ξti)Ni�1 satisfies (for each i):

1. ∀t � 0, 1, . . . ,E[ξt+1i | F t] � 0 (a.s.),2. ∀t � 0, 1, . . . ,E[‖ξt+1i ‖2 | F t] ≤ Vi (a.s.).

Note that here we only need the noise to be mar-tingale noise, rather than iid noise.

4.3. Convergence of Multiagent Noisy EGD toNash Equilibrium

We tackle the convergence issue for multiagent EGDwith noisy gradient and characterize the convergencerate explicitly. Our main result is that, in a WSMgame, multiagent eager EGD given in Algorithm 4converges in last-iterate to the unique Nash equilib-rium in mean square at a O(1T) rate. To that end, westart by recalling two preliminary results in the lit-erature: the first is a fact of sequences first establishedin Chung (1954) (although it does not appear to bewidely known); the second is a simple and well-knownvariational characterization of a Nash equilibrium.

Lemma 2 (Chung 1954). Let at be a nonnegative sequencesuch that at+1 ≤ at(1− P

tp)+ Qtp+q, where P> q> 0,0< p≤ 1,

Q> 0. Then,

at ≤QP

1tq if 0 < p < 1

QP−q

1tq if p � 1.

{(7)

Lemma 3 (Facchinei and Pang 2003). Let G � (N ,X �∏Ni�1 X i, {ui}Ni�1) be a continuous game that has x∗ as a Nash

equilibrium. Then 〈vi(x∗),xi−x∗i 〉 ≤ 0 for each i� 1, ,2 . . . ,N.

Remark 8. Lemma 3 follows directly from Facchineiand Pang (2003) and is a consequence of the defini-tion of a Nash equilibrium: when no player has anyincentive to unilaterally deviate at x∗, then if anyplayer i deviates to xi, the angle formed by the gra-dient and its deviation is nonnegative, thereby indi-cating that his or her individual deviation could nothave done better.

Before stating the final convergence result, wemake note of a few constants. First, because each vi(·)is a continuous function over the convex and compactset X , vi(x) is bounded and denotes G≜

∑Ni�1λi supx∈X‖vi(x)‖22. Next, per Assumption 2 and Remark 7, we

denote V≜∑N

i�1 λiVi, which is an upper bound forthe (weighted) sum of all the variances of the noises.We are now ready to state and prove the conver-gence result.

Theorem 3. Let G be a (λ, β)-WSM game with x∗ being theunique Nash equilibrium. Let Xt’s be the sequence of iteratesgenerated frommultiagent noisy EGD as given in Algorithm 4.If γ0 > 1

β, then Xt converges to x∗ in mean square at O(1T)rate: for any T, E[‖Xt − x∗‖22] ≤ (G+V)γ2

0(minλi)(βγ0−1)1T.

Proof. The proof is not long, so we present it here.Consider the energy function Et � 1

2∑N

i�1 λi‖Xti − x∗i ‖22.

Zhou et al.: Robust Power ManagementOperations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS 9

Page 11: Robust Power Management via Learning and Game Design

Note that it is a random variable and we will look athow Et+1 relates to Et. By definition, we have

Et+1 � 12

∑Ni�1

λi‖Xt+1i − x∗i ‖22

� 12

∑Ni�1

λi‖ProjX iXt+1

i

( )− x∗i ‖22

� 12

∑Ni�1

λi‖ProjX iXt+1

i

( )− ProjX i

x∗i( )‖22, (8)

≤ 12

∑Ni�1

λi‖Xt+1i −x∗i ‖

22 �

12

∑Ni�1

λi‖Xti

+ γ0

tvi Xt( )−x∗i ‖22, (9)

� 12

∑Ni�1

λi ‖Xti − x∗i ‖22 + ‖γ0

tvi Xt( )‖22

(

+ 2γ0

tvi Xt( )

,Xti − x∗i

〈 〉), (10)

� Et + 12γ20

t2∑Ni�1

λi‖vi Xt( )‖22 + γ0

t

∑Ni�1

λi

× vi Xt( ),Xt

i − x∗i〈 〉

.

(11)

Taking the expectation of both sides (by first condi-tioning on Xt and then averaging over Xt) yields

E Et+1[ ] ≤E Et[ ]+12γ20

t2∑Ni�1

λiE ‖vi Xt( )‖22[ ]

+ γ0

t

∑Ni�1

λi E vi Xt( )[ ],Xt

i −x∗i〈 〉

, (12)

� E Et[ ] + 12γ20

t2∑Ni�1

λiE ‖vi Xt( )‖22[ ]

+ γ0

t

∑Ni�1

λiE E vi Xt( ) | Xt[ ],Xt

i − x∗i〈 〉[ ]

, (13)

� E Et[ ] + 12γ20

t2∑Ni�1

λiE ‖vi Xt( ) + ξt+1i ‖22[ ]

+ γ0

tE

∑Ni�1

λi〈vi Xt( ),Xt

i − x∗i 〉[ ]

, (14)

≤E Et[ ]+γ20

t2∑Ni�1

λi E ‖vi Xt( )‖22[ ]+E ‖ξt+1i ‖22

[ ]( )

+ γ0

tE

∑Ni�1

λi〈vi Xt( ),Xt

i −x∗i 〉[ ]

, (15)

≤E Et[ ]+Cγ20

t2+γ0

tE

∑Ni�1

λi〈vi Xt( ),Xt

i −x∗i 〉[ ]

, (16)

where the first equality follows from Assumption 2and the last inequality follows from the fact that bothE[‖vi(Xt)‖22] and E[‖ξt+1i ‖22] are bounded: the former

is bounded as a result of X being bounded and v(·)is continuous; the latter is bounded as a result ofAssumption 2. Consequently, we use a constant C todenote the total upper bound. Note that it can beeasily checked that C ≤ G + V.Next, because the game G is (λ, β)-WSM, we have∑Ni�1λi〈vi(x)−vi(y),xi−yi〉≤−β

2∑N

i�1λi‖xi−yi‖22, ∀x,y∈X .This immediately implies that

∑Ni�1

λi vi x( ) − vi x∗( ), xi − x∗i

〈 〉

� ∑Ni�1

λi vi x( ), xi − x∗i〈 〉 −∑N

i�1λi vi x∗( )

, xi − x∗i〈 〉

≤ − β

2

∑Ni�1

λi‖xi − x∗i ‖22,∀x ∈ X . (17)

Consequently, plugging Xt into the Equation (17)yields

∑Ni�1

λi vi Xt( ),Xt

i − x∗i〈 〉 ≤ ∑N

i�1λi vi x∗( )

,Xti − x∗i

〈 〉

− β

2

∑Ni�1

λi‖Xti − x∗i ‖22,∀x ∈ X .

Because x∗ is a Nash equilibrium, it follows fromLemma 3 that

∑Ni�1 λi〈vi(x∗),Xt

i − x∗i 〉 ≤ 0 and hence∑Ni�1λi〈vi(Xt),Xt

i −x∗i 〉≤−β2∑N

i�1λi‖Xti−x∗i ‖22,∀x∈X . Note

that even though Xt is random, the inequality holdssurely. Consequently, it follows that

E Et+1[ ] ≤ E Et[ ] + Cγ20

t2+ γ0

tE

× ∑Ni�1

λi vi Xt( ),Xt

i − x∗i〈 〉[ ]

≤ E Et[ ]

− β

2γ0

tE

∑Ni�1

λi‖Xti − x∗i ‖22

[ ]+ Cγ2

0

t2, (18)

� E Et[ ] − βγ0

tE Et[ ] + Cγ2

0

t2� 1 − βγ0

t

( )E Et[ ]

+ Cγ20

t2. (19)

Consequently, applying Lemma 2 with p � q � 1and P � βγ0,Q � Cγ2

0 (noting that γ0 > 1β) immedi-

ately yields

E ET[ ] ≤ Cγ20

βγ0 − 11T. (20)

As a result, it follows thatET � ‖Xt−x∗‖22 ≤ 1minλi

E[ET] ≤1

minλi

Cγ20

βγ0−11T. ∎

Zhou et al.: Robust Power Management10 Operations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS

Page 12: Robust Power Management via Learning and Game Design

5. Theoretical Guarantees of RobustPower Control

In this section, we establish the theoretical guaranteesof theproposedpower control algorithms.Our approachlies in designing two weighted strongly monotonegames such that each variant of the algorithm can beinterpreted as a special instance of themultiagent EGDlearning dynamics for the corresponding game. Weemphasize that it is not the case that the transmittersare playing a repeated game where their utilities areprescribed by the designed function. Instead, this ismerely used as an analytical framework to study theproposed algorithms. In fact, this is the analyticalframework we use to design and derive the proposedpower control algorithms in the first place.

5.1. Designed WSM GamesUnder the notation introduced in Section 2, we con-sider the following two games G1,G2 as given below,where the set N of players is the set of links in thepower control contexts.

1. G1 � (N ,P, {ui}Ni�1), where ui(p) � − 12Gii

(Giipi −r∗i (∑j��i Gijpj + ηi))2.2. G2 � (N ,P, {ui}Ni�1), where ui(p) � − 1

2G2ii(Giipi −

r∗i (∑j��i Gijpj + ηi))2.When the channel is feasible, per Fact 1, a (neces-

sarily unique) optimal joint transmission power p∗exists. Because the optimal joint transmission powermatches the quality-of-service constraints exactly,every player’s utility will be zero if they transmitaccording to p∗. This implies that p∗ must be a Nashequilibrium, because zero is the highest utility thatcan be possibly achieved for any given player (link).In fact, at p∗, not only will any player fail to obtainbetter utility by unilaterally deviating from p∗, theplayers cannot achieve better utility through collu-sion of any type. Furthermore, p∗ is the unique Nashequilibrium, because if p were any other Nash equi-librium, then necessarily one player’s utility is belowzero. For this player, he or she will have the incentiveto transmit at a higher power compared with thecurrent prescribed transmission power so as to achievebetter utility.

The preceding discussion essentially establishesthat when the channel is feasible, both of these gamesadmit a unique Nash equilibrium p∗, which is theoptimal joint transmission power. However, it stillremains a question as to whether they are WSM. Thefollowing lemma presents a rather intriguing result:the feasibility of the channel not only guarantees theexistence of a unique Nash equilibrium but also, moreimportantly, implies that both games are WSM.

Lemma 4. Fix λ � ( 1G11

, 1G22

, . . . , 1GNN

). Assume that thechannel (G, η) is feasible and thereby let p∗ ∈ P be the

optimal joint transmission power as defined in Definition 3.The we have1. β≜ − λmax(12 (W − I) + 1

2 (WT − I)) > 0, where W isthe reweighted matrix in Equation (3).2. The designed game G1 � (N ,P, {ui}Ni�1) is (λ, β)-

WSM with the unique Nash equilibrium p∗.3. The designed game G2 � (N ,P, {ui}Ni�1) is (1, β)-

WSM with the unique Nash equilibrium p∗.Proof. We first establish the first statement. For each i,

vi p( ) � ∂ui p( )∂pi

� − Giipi − r∗i∑j��i

Gijpj + ηi

( )( ), (21)

which can be easily seen as affine in pi and henceconcave, with all the smoothness assumptions satis-fied. For each i, j, ∂vi(p)∂pi

� −Gii,∂vi(p)∂pj

� r∗i Gij. Computingthe λ-weighted Hessian matrix of the designed game,we obtain Hλ

ij (p) � 12Gii

∂vi(p)∂pj

+ 12Gjj

∂vj(p)∂pi

.

If i � j, then Hλii (p) � 2 × 1

2Gii

∂vi(p)∂pi

� −1.If i �� j, then Hλ

ij (p)� 12Gii

r∗i Gij+ 12Gjj

r∗j Gji� 12(r∗i Gij

Gii+r∗j Gji

Gjj).

Let W be the reweighted gain matrix as defined inEquation (3) and I ∈ RN×N be the identity matrix. Fromthe previous calculations, it follows that

Hλ � 12

W − I( ) + 12

WT − I( )

.

Because the channel (G, η) is feasible, per Theorem 1,λmax(W) < 1. Consequently, (W − I) is negative definite(although not necessarily symmetric, i.e., may havecomplex eigenvalues): ∀p ∈ RN ,

p W − I( )p � pWp − ‖p‖2 ≤ λmax W( ) − 1( )‖p‖2 < 0.

Similarly, (WT − I) is negative definite, thereby im-plying Hij(p) is negative definite. Because Hij(p) isnegative definite for every p (because it is indepen-dent of p), Lemma 1 establishes that the game isλ-DSC. Because p∗ results the maximum utility forevery player: ui(p∗) � 0, p∗ must be a Nash equilib-rium and hence the unique Nash equilibrium. Notethat Hλ(p) does not depend on p and is hence uni-formly negative definite and has a uniform upperbound β on the largest eigenvalue.Statement 2 follows by a similar argument by noting

that in this case,

vi p( ) � ∂ui p( )∂pi

� − pi − r∗i∑

j��i Gijpj + ηiGii

( ). (22)

Computing the λ-weighted Hessian for λ � (1,1, . . . , 1) yields

Hλ � 12

W − I( ) + 12

WT − I( )

,

thereby establishing the conclusion. ∎

Zhou et al.: Robust Power ManagementOperations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS 11

Page 13: Robust Power Management via Learning and Game Design

With the above two designed games, we can connectthe proposedpower control algorithms to themultiagentEGD learning dynamics and immediately derive fastconvergence results in the deterministic channel envi-ronments. We start with some notation. Let M1 and M2

be two matrices such thatM1ii � Gii,M1

ij � −r∗i Gij wheni �� j and M2

ii � 1,M2ij � −r∗i Gij

Giiwhen i �� j. Denote

ρ1 ≜−2λmax(12(W−I)+1

2(WT−I))λ2max(M1) , ρ2 ≜

−2λmax(12(W−I)+12(WT−I))

λ2max(M2) (note

that both ρ1 and ρ2 are positive). One thing to notehere is that in order to achieve a geometric conver-gence rate, instead of using diminishing step-size asin Algorithm 2, we need a small constant step-size.

Corollary 1. Let pt be the iterates generated fromAlgorithm 2under a deterministic channel environment (i.e., Gt � G,ηt � η) and a constant step-size αt � α. Then,

1. If 0 < α < ρ1, then ‖p − p∗‖22 � O(κ−T) under Var-iant A, for some κ > 1.

2. If 0 < α < ρ2, then ‖p − p∗‖22 � O(κ−T) under Var-iant B for some κ > 1.

Remark 9. With some algebra, Corollary 1 follows as aconsequence of the framework we have presented sofar. In particular, Variant A and Variant B are multi-agent EGD for the designed games (N ,P, {ui}Ni�1) and(N ,P, {ui}Ni�1), respectively, where αt � α. To see this,note that ∂ui(p)

∂pi� −(Giipi − r∗i (∑j ��i Gijpj + ηi)) and ∂ui(p)

∂pi�

−(pi − r∗i∑

j ��i Gijpj+ηiGii

). Further, from Equation (21), it isclear that v(·) is ρ1-Lipschitz because ‖v(p)−v(p)‖2 �‖M1p−M1p‖2 ≤λmax(M1)‖p− p‖2. Similarly, per Equa-tion (22),v(·) is ρ2-Lipschitz. Consequently, per Lemma 4and Remark 6, we have the convergence results.

5.2. Channel Feasibility in Stochastic EnvironmentsBeforewemoveon toestablish the theoretical guaranteesin the stochastic channel environments,weneed a notionthat characterizes channel feasibility in such cases. Be-cause the channel isfluctuating, itwould be too strong torequire that each channel realization on any given timestep is feasible. Instead, here we only impose the mildrequirement that a channel is feasible on average (andhence can be feasible sometimes and infeasible someother times). The next definition formalizes it:

Definition 5. A channel (G, η) (or equivalently (W, γ)) is1. Type-I mean-feasible if (E[G],E[η]) is feasible,

where expectation is taken component-wise;2. Type-II mean-feasible if (E[W],E[γ]) is feasible,

where expectation is taken component-wise.

The two types of mean-feasible channels are closelyrelated: although in general neither implies the other, inthe important and commonly occurring case thatGij’s areindependent of Gii, Type-I mean-feasible is weaker thanType-IImean-feasible,as formalizedbythefollowinglemma:

Lemma 5. If for each i ∈ {1, 2, . . . ,N}, Gij and Gii arepairwise independent for each j �� i, then a channel that isType-II mean-feasible is Type-I mean-feasible.

Proof. Let a channel (G, η) be Type-II mean-feasible.Defined the matrix W to be

Wij :�0, i � j

r∗i E Gij[ ]E Gii[ ] , i �� j.

{(23)

We have

EGij

Gii

[ ]� E Gij

[ ]E

1Gii

[ ]≥ E Gij

[ ]E Gii[ ] , (24)

where the equality follows from independence andthe inequality follows from Jensen’s inequality andthe fact that f (x) � 1

x is convex when x is positive. Thisimmediately implies

E W[ ] ≥ W,

where inequality holds component-wise. Because eachentry in both matrices is nonnegative, we have

λmax E W[ ]( ) � maxu∈RN :‖u‖2�1

uE W[ ]u � maxu∈RN+ :‖u‖2�1

uE W[ ]u,

λmax W( ) � max

u∈RN+ :‖u‖2�1uWu � max

u∈RN+ :‖u‖2�1uWu.

Further, for eachu ∈ RN+ , Equation (24) implies uE[W]u≥uWu, which then leads to

λmax E W[ ]( ) � maxu∈RN+ :‖u‖2�1

uE W[ ]u

≥ maxu∈RN+ :‖u‖2�1

uWu � λmax W( )

.

By Theorem 1, λmax(W) < 1 and hence the channelmust be Type-I mean-feasible. ∎

Remark 10. These two types of mean-feasible envi-ronments correspond to the respective environmentunder which Variant A and Variant B of the proposedalgorithm admit theoretical performance guarantees.As it turns out, Type-I mean-feasible is required forVariant A and Type-II mean-feasible is required forVariant B. Per the previous lemma, Type-II mean-feasible is a slightly stronger condition than Type-Imean-feasible: this is to be expected because VariantB of the algorithm requires less information in choosingthe power iterates than that of Variant A.

5.3. Convergence of Robust Power Control:Stability and Optimality

We are now ready to state the main result. We ob-tain this result by casting the proposed power con-trol algorithms in the multiagent noisy EGD learn-ing framework.

Zhou et al.: Robust Power Management12 Operations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS

Page 14: Robust Power Management via Learning and Game Design

Theorem 4. Given a stochastic channel (G, η) (or equiva-lently (W, γ)) according to Assumption 1: let γ0 in Algo-rithm 2 be chosen such that γ0 > 1

−λmax(12(W−I)+12(WT−I)).

1. If the channel is Type-I mean-feasible, with p∗ ∈ Pbeing the optimal joint transmission power for (E[G],E[η]), then E[‖Pt → p∗‖22] � O(1T), where Pt is given byVariant A in Algorithm 2.

2. If the channel is Type-II mean-feasible, with p∗ ∈ Pbeing the optimal joint transmission power for (E[W],E[γ]) and for each i, there exists some g

i> 0 such that

Gii ≥ gia.s., then E[‖Pt → p∗‖22] � O(1T),where Pt is given

by Variant B in Algorithm 2.

Remark 11. Two things to note here. First, the mean-square convergence to a constant joint transmissionpower in the presence of persistent stochastic channelfluctuations is a manifestation of the stability of theproposed algorithms. The intuition behind this stabilityis that as past powers are incorporated into the currentpower via a weighted sum, the random environmentshave less and less impact on the current power iter-ate because the step-sizes are decreasing. Second, Gii ≥gia.s. is a rather mild assumption as it means that

power gain between each transmitter and its intendedreceiver is lower bounded by some positive constant.

Proof. Because (G, η) (or (W, γ)) is random, we con-sider the following two games G1,G2 as given below,where the set N of players is again the set of wirelesslinks in the power control contexts:

1. G1 � (N ,P, {ui}Ni�1) ui(p) � E[− 12Gii

(Giipi − r∗i (∑j ��i,Gijpj + ηi))2].

2. G2 � (N ,P, {ui}Ni�1) ui(p) � E[− 12G2

ii(Giipi − r∗i (∑j ��i,

Gijpj + ηi))2].Under the above two designed games, it is straight-

forward to verify that

∂ui p( )∂pi

� −E Giipi − r∗i∑j��i

Gijpj + ηi

( )( )[ ]

� − E Gii[ ]pi − r∗i∑j��i

E Gij[ ]

pj + E ηi[ ]( )( )

, (25)

∂ui p( )∂pi

� −E pi − r∗i∑

j��i Gijpj + ηiGii

( )[ ]

� − pi − r∗i∑j��i

EGij

Gii

[ ]pj + E

ηiGii

[ ]( )( )

� − pi −∑j��i

E Wij[ ]

pj + E γi[ ]( )( )

. (26)

For the first claim, per Assumption 1, we can writeGt

ij � E[Gij] + Gtij, η

ti � E[Gij] + ηti , where Gt

ij and ηti areboth sequences of iid, zero-mean and finite-variance

random variables. Consequently, the gradient update(line 5) in Algorithm 2 can be equivalently written as

Xt+1i � Xt

i −γ0

tE Gii[ ] + Gt

ii

( )Pti − r∗i

{

× ∑j��i

E Gij[ ] + Gt

ij

( )Ptj + E ηi

[ ] + ηti

( )}, (27)

� Xti −

γ0

tE Gii[ ]Pt

i − r∗i∑j ��i

E Gij[ ]

Ptj

( )+ E ηi

[ ]{

+ GtiiP

ti − r∗i

∑j��i

GtijP

tj + ηti

{ }}. (28)

Denoting ξt+1i � GtiiP

ti − r∗i ∑j ��i Gt

ijPtj + ηti , it follows that

E[ξt+1i |P0, . . . ,Pt] � 0,a.s. and Var[ξt+1i | P0, . . . ,Pt] < ∞,a.s. BecauseP is bounded, there exists a constantB > 0such thatVar[ξt+1i |P0, . . . ,Pt] ≤B,a.s.,∀t. Consequently,the martingale noise ξt satisfies Assumption 2 perRemark 7. This implies that Algorithm 2 is a specialcase of Algorithm 4. Further, because the channel isType-I mean-feasible, Lemma 4 implies that G1 is aWSM game with p∗ being the unique Nash equilib-rium. The result therefore follows by directly ap-plying Theorem 3.The second claim follows from a similar line of

reasoning: the only thing to note here is that thebounded second moments assumption holds becauseGii ≥ g

ia.s.. ∎

6. ConclusionWe close with a few remarks. First, although we havefocused on stochastic iid environments in this paper,the designed algorithm given in Algorithm 2 can stilloperate in an arbitrary time-varying environment.Webelieve our convergence results would generalize tothe stationary and ergodic environment case, al-thoughmaking that fully rigorous is beyond the scopeof the paper and requires new analysis techniques; wehence leave that for future work. Further, the recentempirical work (Ward et al. 2018) indicates thatvariants of the dual averaging power control algo-rithm can be made robust to delayed feedback, broad-ening its applicability even further. Second, our the-oretical investigation on multiagent EGD learningwith imperfect first-order feedback falls within thebroader inquiry of game-theoretic learning, an areathat stands at the intersection of learning and gametheory and that seeks to answer the following question:what is the evolution of play when every playeradopts a no-regret learning algorithm? In particular, ifall players of a repeated game employ an updating rule that

Zhou et al.: Robust Power ManagementOperations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS 13

Page 15: Robust Power Management via Learning and Game Design

leads to no regret, do their actions converge to a Nashequilibrium of the one-shot game? We aim to obtainquantitative convergence results for other no-regretlearning algorithms in different classes of games infuture work, with a particular goal in understandingwhen would the rate O(1T) be achievable. Anotherinteresting theoretical direction to take is to obtainconvergence resultswhen only zeroth-order feedbackis obtained. Flaxman et al. (2005) characterized regretguarantees for online gradient descent in such cases.Studying the convergence issue in the multiagentsetup would be interesting.

Finally, as mentioned in the introduction, the land-scape of power management is vast and includes manyother objectives. We believe that this game-design ap-proach will be fruitful in those other problems: in orderto design distributed algorithms that converge to someoptimal/desired action, one can design a WSM gamewhose unique Nash equilibrium corresponds to theoptimal action and thereby deriving an algorithm im-mediately via the online learning algorithm. Conse-quently, in this approach, EGD, and more broadly anyno-regret online learning algorithm, can be viewed as ameta-algorithm that canbe instantiatedvia thedesignofspecific games.

AcknowledgmentsThe authors are grateful to the associate editor and the threereviewers for their extremely valuable comments and con-structive feedback that led to significant improvement of thecurrent paper.

Endnotes1The literature on power control is too broad to review here; for acomprehensive survey, we refer the reader to Chiang et al. (2008).2Motivated by the random nature of network feasibility in realisticwireless environments, we frame all of the above in a bona fidestochastic setting where exact gradient information is not available,either because the players’ payoffs are themselves stochastic in natureor because the players’ feedback is contaminated by noise, obser-vation errors, and/or other exogenous stochastic effects. To model allthis, we consider a noisy feedback model where players only haveaccess to a first-order oracle providing unbiased, bounded-varianceestimates of their payoff gradients at each step. Apart from this,players are assumed to operate in a “black box” setting, without anyknowledge of the game’s structure or their payoff functions (or eventhat they are playing a game).3 In practice, only access to some aggregated statistics of the powerused by all the other transmitters is feasible. The statistics typicallytake the form of some function of all the powers used by thetransmitters, such as the one given in Algorithm 2.

ReferencesAlpcan T, Basar T, Dey S (2006) A power control game based on

outage probabilities for multicell wireless data networks. IEEETrans. Wireless Comm. 5(4):890–899.

Alpcan T, Basar T, Srikant R, Altman E (2002) CDMA uplinkpower control as a noncooperative game.Wireless Networks 8(6):659–670.

Altman E, Basar T, De Pellegrini F (2010) Optimal monotone for-warding policies in delay tolerant mobile ad-hoc networks.Performance Evaluation 67(4):299–317.

Antipin A (2002) Gradient approach of computing fixed points ofequilibrium problems. J. Global Optim. 24(3):285–309.

Arrow KJ, Hurwicz L (1960) Stability of the gradient process inn-person games. J. Soc. Indust. Appl. Math. 8(2):280–294.

Au-Yeung KY, Robertson T, Hafezi H, Moon G, DiCarlo L,Zdeblick M, Savage G (2010) A networked system for self-management of drug therapy and wellness. Wireless Health(ACM, New York), 1–9.

Balandat M, Krichene W, Tomlin C, Bayen A (2016) Minimizingregret on reflexive Banach spaces and learningNash equilibria incontinuous zero-sum games. Preprint, submitted June 3, https://arxiv.org/abs/1606.01261.

Bradley J, Barbier J, Handler D (2013) Embracing the internet ofeverything to capture your share of $14.4 trillion. White Paper,Cisco Systems, Inc., San Jose, CA.

Byrne C, Lim CL (2007) The ingestible telemetric body core tem-perature sensor: a review of validity and exercise applications.British. J. Sports Medicine 41(3):126–133.

Candogan UO, Menache I, Ozdaglar A, Parrilo PA (2010) Near-optimal power control in wireless networks: A potential gameapproach. Proc. 29th IEEE Conf. Comput. Comm. (IEEE, Piscat-away, NJ), 1–9.

Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games(Cambridge University Press, Cambridge, UK).

Chiang M, Hande P, Lan T, Tan CW (2008) Power control in wirelesscellular networks. Foundations Trends® Networking 2(4):381–533.

Chung KL (1954) On a stochastic approximation method. Ann. Math.Statist. 25(3):463–483.

Cui T, Chen L, Low SH (2008) A game-theoretic framework formedium access control. IEEE J. Selected Areas Comm. 26(7):1116–1127.

Deakin M (2013) Smart Cities: Governing, Modelling and Analysingthe Transition (Taylor & Francis, London).

El Gamal A, Mammen J, Prabhakar B, Shah D (2006a) Optimalthroughput-delay scaling in wireless networks-part i: The fluidmodel. Information Theory IEEE Trans. Inform. Theory 52(6):2568–2592.

El Gamal A, Mammen J, Prabhakar B, Shah D (2006b) Optimalthroughput-delay scaling in wireless networks-part i: The fluidmodel. Information Theory IEEE Trans. Inform. Theory 52(6):2568–2592.

Eryilmaz A, Modiano E, Ozdaglar A (2006) Randomized algorithmsfor throughput-optimality and fairness in wireless networks.Proc. 45th IEEE Conf. Decision Control (IEEE, Piscataway, NJ),1936–1941.

Eryilmaz A, Ozdaglar A, Médard M, Ahmed E (2008) On the delayand throughput gains of coding in unreliable networks. IEEETrans. Inform. Theory 54(12):5511–5524.

Facchinei F, Pang J-S (2007) Finite-Dimensional Variational Inequal-ities and Complementarity Problems (Springer Science & Busi-ness Media).

Fan X, Alpcan T, Arcak M, Wen TJ, Basar T (2006) A passivity ap-proach to game-theoretic CDMA power control. Automatica42(11):1837–1847.

Flam SD (2002) Equilibrium, evolutionary stability and gradientdynamics. Internat. Game Theory Rev. 4(4):357–370.

Flaxman AD, Kalai AT, Kalai AT, McMahan HB (2005) Online convexoptimization in the bandit setting: Gradient descent without a gra-dient.Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms(Society for Industrial and Applied Mathematics, Philadelphia),385–394.

Foschini GJ, Miljanic Z (1993) A simple distributed autonomouspower control algorithm and its convergence. IEEE Trans. Ve-hicular Tech. 42(4):641–646.

Zhou et al.: Robust Power Management14 Operations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS

Page 16: Robust Power Management via Learning and Game Design

Gitzenis S, Bambos N (2002) Power-controlled data prefetching/caching in wireless packet networks. Proc. 21st Annual JointConf. IEEE Comp. Comm. Soc., vol. 3 (IEEE, Piscataway, NJ),1405–1414.

Goldsmith A (2005) Wireless Communications (Cambridge UniversityPress, Cambridge, UK).

Gopalan A, Caramanis C, Shakkottai S (2015) Wireless schedulingwith partial channel state information: Large deviations andoptimality. Queueing Systems 80(4):293–340.

Han Z, Niyato D, SaadW, Ar TB, Rungnes AH (2014) Game Theory inWireless and Communication Networks (Cambridge UniversityPress, Cambridge, UK).

Han Z, Niyato D, SaadW, Basar T, Hjørungnes A (2011) Game Theoryin Wireless and Communication Networks: Theory, Models, andApplications (Cambridge University Press, Cambridge, UK).

Hazan E (2016) Introduction to online convex optimization. FoundationsTrends Optim. 2(3–4):157–325.

Kar S, Moura JMF, Ramanan K (2012) Distributed parameter estimationin sensor networks: Nonlinear observation models and imperfectcommunication. IEEE Trans. Inform. Theory 58(6):3575–3605.

Krichene S, Krichene W, Dong R, Bayen A (2015) Convergence ofheterogeneous distributed learning in stochastic routing games.53rd Annual Allerton Conf. Comm., Control, Comput. (Allerton)(IEEE, Piscataway, NJ), 480–487.

Lam K, Krichene W, Bayen A (2016) On learning how players learn:Estimation of learning dynamics in the routing game. 2016 ACM/IEEE 7th Internat. Conf. Cyber-Physical Systems (IEEE, Piscataway,NJ), 1–10.

Li N,Marden JR (2013) Designing games for distributed optimization.IEEE J. Selected Topics Signal Processing 7(2):230–242.

Menache I, Ozdaglar A (2010) Network Games: Theory, Models, andDynamics(Morgan & Claypool, San Rafael, CA).

Mertikopoulos P, Zhou Z (2019) Learning in games with continuousaction sets and unknown payoff functions. Math. Programming173(1-2):465–507.

Mitra D (1994) An asynchronous distributed algorithm for powercontrol in cellular radio systems. Holtzman JM, Goodman DJ,eds. Wireless and Mobile Communications, Springer Series in En-gineering and Computer Science, vol. 277 (Springer, Bos-ton), 177–186.

Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochasticapproximation approach to stochastic programming. SIAMJ. Optim. 19(4):1574–1609.

Rappaport T (2001) Wireless Communications: Principles and Practice,2nd ed. (Prentice Hall PTR, Upper Saddle River, NJ).

Reddy AA, Banerjee S, Gopalan A, Shakkottai S, Ying L (2012) Ondistributed scheduling with heterogeneously delayed network-state information. Queueing Systems 72(3-4):193–218.

Reddy A, Shakkottai S, Ying L (2008) Distributed power control inwireless ad hoc networks using message passing: Throughputoptimality and network utility maximization. 42nd Annual Conf.Inform. Sci. Systems (IEEE, Piscataway, NJ), 770–775.

Rosen BJ (1965) Existence and uniqueness of equilibrium points forconcave n-person games. Econometrica 33(3):520–534.

Seferoglu H, Lakshmikantha A, Ganesh A, Key P (2008) Dynamicdecentralizedmulti-channel MACprotocols. Inform. Theory Appl.Workshop (IEEE, Piscataway, NJ), 100–110.

Shalev-Shwartz S (2012) Online learning and online convex opti-mization. Foundations Trends Machine Learn. 4(2):107–194.

Tan CW (2014) Wireless network optimization by Perron-Frobeniustheory. 48th Annual Conf. Inform. Sci. Systems (CISS) (IEEE,Piscataway, NJ), 1–6.

Tan CW (2015) Wireless network optimization by Perron-Frobenius the-ory.Foundations Trends Networking 9(2-3):107–218.

Ulukus S, Yates RD (1998) Stochastic power control for cellular radiosystems. IEEE Trans. Comm. 46(6):784–798.

Ward A, Zhou Z, Mertikopoulos P, Bambos N (2018) Power controlwith random delays: Robust feedback averaging. 2018 IEEEConf. Decision Control (CDC) (IEEE, Piscataway, NJ), 7040–7045.

Weeraddana PC, Codreanu M, Latva-aho M, Ephremides A,Fischione C (2012) Weighted sum-rate maximization inwireless networks: A review. Foundations Trends Networking6(1–2):1–163.

Yates RD (1996) A framework for uplink power control in cellularradio systems. IEEE J. Selected Areas Commun. 13(7):1341–1347.

Zhou Z, Bambos N (2015) Wireless communications games in fixedand random environments. 54th IEEE Conf. Decision Control(CDC) (IEEE, Piscataway, NJ), 1637–1642.

Zhou Z, Bambos N, Glynn P (2016a) Dynamics on linear influencenetwork games under stochastic environments. Zhu Q, AlpcanT, eds. 7th Internat. Conf. Decision Game Theory Security, vol. 9996(Springer-Verlag, Berlin, Heidelberg), 114–126.

Zhou Z, Bambos N, Glynn P (2018a) Deterministic and stochasticwireless network games: Equilibrium, dynamics, and price ofanarchy. Oper. Res. 66(6):1498–1516.

Zhou Z, Glynn P, Bambos N (2016b) Repeated games for powercontrol in wireless communications: Equilibrium and regret.55th IEEE Conf. Decision Control (CDC) (IEEE, Piscataway,NJ), 3603–3610.

Zhou Z, Miller D, Glynn P, Bambos N (2016c) A stochastic stabilitycharacterization of the Foschini-Miljanic algorithm in randomwireless networks. 2016 IEEE Global Commun. Conf. (GLOBE-COM) (IEEE, Piscataway, NJ).

Zhou Z, Mertikopoulos P, Bambos N, Glynn PW, Tomlin C (2017a)Countering feedback delays in multi-agent learning. vonLuxburg U, Guyon I, eds. Proc. 31st Internat. Conf. NeuralInform. Processing Systems (Curran Associates, Red Hook,NY), 6171–6181.

Zhou Z, Mertikopoulos P, Moustakas AL, Bambos N, Glynn P(2017b) Mirror descent learning in continuous games. 56thAnnual IEEE Conf. Decision Control (CDC) (IEEE, Piscataway,NJ), 5776–5783.

Zhou Z, Mertikopoulos P, Athey S, Bambos N, Glynn PW, Yinyu Y(2018b) Learning in games with lossy feedback. Bengio S,Wallach HM, eds. Proc. 32nd Internat. Conf. Neural Inform. Pro-cessing Systems (Curran Associates, Red Hook, NY), 5140–5150.

Zhengyuan Zhou is an assistant professor in the SternSchool of Business at New York University. His researchinterests lie in learning, optimization, game theory, andstochastic systems.

Panayotis Mertikopoulos has been a CNRS researcher atthe Laboratoire d’Informatique de Grenoble since 2011. Hisresearch interests lie in the areas of game theory, online andstochastic optimization, and algorithmic learning, as well as intheir applications to operations research and wireless networks.

Aris Moustakas is currently a faculty member with thePhysics Department at the National Kapodistrian Universityof Athens, Greece. His research interests include statisticalphysics, randommatrix theory andwireless communications.

Nicholas Bambos is the current chair and the R. WeilandProfessor of the Department of Management Science andEngineering at Stanford University. His research interests liein communication networks, queueing theory, and appliedprobability.

Peter Glynn is the Thomas Ford Professor of Engineeringin the Department of Management Science and Engineeringat StanfordUniversity. His research interests lie in simulation,computational probability, queueing theory, statistical in-ference for stochastic processes, and stochastic modeling.

Zhou et al.: Robust Power ManagementOperations Research, Articles in Advance, pp. 1–15, © 2020 INFORMS 15