Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash...

37
arXiv:1505.00391v4 [cs.GT] 22 May 2020 Learning and Efficiency in Games with Dynamic Population Thodoris Lykouris Vasilis Syrgkanis Éva Tardos First version: May 2015 Current version: May 2020 § Abstract We study the quality of outcomes in repeated games when the population of players is dynamically changing and participants use learning algorithms to adapt to the changing environment. Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly, and in such games players often use algorithmic tools to learn to play in the given environment. Most previous work on learning in repeated games assumes that the population playing the game is static over time. We analyze the efficiency of repeated games in dynamically changing environments, mo- tivated by application domains such as Internet ad-auctions and packet routing. We prove that, in many classes of games, if players choose their strategies in a way that guarantees low adaptive regret, then high social welfare is ensured, even under very frequent changes. In fact, in large markets learning players achieve asymptotically optimal social welfare despite high turnover. Previous work has only showed that high welfare is guaranteed for learning outcomes in static environments. Our work extends these results to more realistic settings when participation is drastically evolving over time. Microsoft Research NYC, [email protected]. Work mostly conducted while the author was a Ph.D. student at Cornell University and was supported by a Google Ph.D. fellowship. Microsoft Research NE, [email protected] Cornell University, [email protected]. Work supported in part by NSF grant CCF-1563714, CCF- 1408673, ONR grant N00014-08-1-0031, and ASOFR grant F5684A1. § Preliminary version appeared in ACM Symposium on Discrete Algorithms 2016 (SODA 2016). This version adds a major new result: asymptotic optimality of simultaneous second-price auctions with dynamic population. Presentation is significantly simplified and all results are presented parametrically.

Transcript of Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash...

Page 1: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

arX

iv:1

505.

0039

1v4

[cs

.GT

] 2

2 M

ay 2

020

Learning and Efficiency in Games with Dynamic Population

Thodoris Lykouris∗ Vasilis Syrgkanis† Éva Tardos‡

First version: May 2015Current version: May 2020§

Abstract

We study the quality of outcomes in repeated games when the population of players is

dynamically changing and participants use learning algorithms to adapt to the changing

environment. Game theory classically considers Nash equilibria of one-shot games, while in

practice many games are played repeatedly, and in such games players often use algorithmic

tools to learn to play in the given environment. Most previous work on learning in repeated

games assumes that the population playing the game is static over time.

We analyze the efficiency of repeated games in dynamically changing environments, mo-

tivated by application domains such as Internet ad-auctions and packet routing. We prove

that, in many classes of games, if players choose their strategies in a way that guarantees low

adaptive regret, then high social welfare is ensured, even under very frequent changes. In

fact, in large markets learning players achieve asymptotically optimal social welfare despite

high turnover. Previous work has only showed that high welfare is guaranteed for learning

outcomes in static environments. Our work extends these results to more realistic settings

when participation is drastically evolving over time.

∗Microsoft Research NYC, [email protected]. Work mostly conducted while the author was a Ph.D.student at Cornell University and was supported by a Google Ph.D. fellowship.

†Microsoft Research NE, [email protected]‡Cornell University, [email protected]. Work supported in part by NSF grant CCF-1563714, CCF-

1408673, ONR grant N00014-08-1-0031, and ASOFR grant F5684A1.§Preliminary version appeared in ACM Symposium on Discrete Algorithms 2016 (SODA 2016). This version

adds a major new result: asymptotic optimality of simultaneous second-price auctions with dynamic population.Presentation is significantly simplified and all results are presented parametrically.

Page 2: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

1 Introduction

Quantifying the inefficiency caused by the selfish behavior of agents in systems of economicsignificance lies at the heart of algorithmic game theory. The Price of Anarchy [KP99] measuresthis inefficiency by comparing the welfare of the society where agents follow the instructions of abenevolent planner to the one where they behave in a selfish way. This notion has received greatattention in both computer science and operations research, leading to a productive line of workin a plethora of application domains. For instance, routing games [RT02, CSSM03], resourceallocation [JT04], and online ad-auctions [CKK+15] have bounds on the efficiency loss due toincentives. This enables the platform operator to quantify the potential improvement in alteringthe system to better address incentives and make informed decisions on the trade-offs involved.

The recent emergence of repeated multi-player games has cast doubt on the aforementionedefficiency guarantees that rely on modeling selfish behavior via the equilibria of the one-shotgame at each step. Unlike one-shot games, in settings such as online advertising, each individualadvertising opportunity has negligible effect and their significance stems from the fact that thesetransactions occur repeatedly in very high frequency. Modeling the outcome of such interactionsas the Nash equilibrium of the corresponding one-shot game is unrealistic for many reasons.Participants would need an unrealistic amount of information to know who else is participatingat each moment, to compute a Nash equilibrium of the game. We note that the players repeatedlybest responding to the current environment, doesn’t lead to the Nash equilibria of the game evenwithout the evolving participant list. The need for unrealistic amount of information is evenmore severe if we aim to model the outcome as an equilibrium of the repeated game.

To overcome the concerns about the Nash equilibrium notion, a more attractive model of playerbehavior is to assume that players use a form of algorithmic learning. In environments wherethe value or cost of a single interaction is low (a single ad opportunity or routing a singlepacket), using learning to find good strategies can be more attractive than precomputing effectivestrategies, and learning also enables the participants to adapt to changes in the environment.Blum et al [BEDL06, BHLR08] suggest no-regret learning as a behavioral model of players, andstudy the outcome of such player behavior assuming that the exact same set of players engagerepeatedly in the same game. The notion of no-regret learning assumes that players do at least aswell as any single fixed strategy with hindsight. This assumption is guaranteed by many simpledecentralized dynamics [HMC12, FS97] but crucially does not require that players use a particulardynamic; it only posits that their behavior approximately satisfies the no-regret condition. Thisis a broad behavioral assumption as it is satisfied by repeated occurrences of the one-shot Nashequilibrium outcomes, and even allows players to outperform the no-regret condition and dobetter than any single strategy; it only assumes that players avoid playing suboptimal actionsat the existence of a consistently better alternative. As a result, it models well situations whenparticipants use data from past interactions to adjust their strategies. The no-regret assumptionis supported by evidence in Bing ad-auction data by Nekipelov et al [NST15] using bidders withlarge budgets and frequent bid changes, who likely are using a form of learning to adjust theirbehavior.1

On the efficiency front, most price of anarchy results, initially developed for Nash equilibria,extend to the broader learning behavioral assumption as long as the game repeated is completelyidentical across rounds. This is thanks to extension theorems [Rou09, ST13] for a large class ofgames that satisfy the so-called smoothness property and include routing games and simultaneousmulti-item auctions.

1Nisan and Noti [NN16] study this assumption with human players, and their finding is more mixed. Biddersgiven low values, who may have been best off withdrawing from the game behaved irrationally, while players withhigher assigned values did satisfy the assumption.

1

Page 3: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Although this learning-based analysis has improved the relevance of efficiency guarantees, thereis still a major concern for the platform operator as the bounds only hold when players remainidentical across all rounds of the game which rarely holds in applications of interest. In ad-auctions, advertisers may change their value for different keywords based on recent trends ormarketing decisions. In packet routing, when a video conference ends, the configuration ofdata transmission alters. In transportation, when people switch employments or take vacations,similar changes in the routing patterns arise. Our work addresses this important consideration,highlighting the robustness of the learning-based analysis to changing environments and makingprice of anarchy guarantees relevant even when the population is constantly evolving.

1.1 Our contribution

We provide a framework of dynamic-population repeated games. Consider a game with n players,such as ad-auctions with n advertisers, or traffic routing with n traffic streams. We refer to thisas the stage game. The game is repeated each time step t = 1, 2, . . . , T , where after each step,each player may leave with some probability p and be replaced with a different player. While weassume that players leave with a probability p at random, the replacement player is worst case.This model of churn in the game is combining two important features: with a probability p ofdeparture, each player is expected to be in the game for 1/p time steps. We need to guaranteethat 1/p is long enough that the player can learn to achieve small regret. At the same time, theworst case nature of the replacement of players enforces that learning behavior is necessary toperform well. For example, a routing choice may be great at the start, but due to newly arrivingplayers may become congested as time goes on.

The model of learning behavior we use in this paper is that of adaptive learning, a form oflearning well-suited for changing environments. Learning as a behavior has the great advantagethat it can adjust to changes in the environment. Classical no-regret learning compares theplayer’s outcome to a fixed strategy with hindsight. This is appropriate in games with shorttime spans, such as the data of [NST15, NN16] above. For repeated games played for longerperiods, no-regret against a fixed strategy can be a very weak benchmark: it may be necessaryfor the learner to adjust her strategy due to the changing environments. Natural extensions of no-regret learning can guarantee small regret also against a slowly changing sequence of strategies.We formally define this adaptive learning assumption and discuss algorithms that achieve this inSection 2.2.

Our results. We provide a framework that transforms the static-population price of anarchyguarantees such as [Rou09, ST13] to corresponding dynamic-population guarantees. Our guaran-tees are parametric and gracefully degrade with the amount of player turnover but, surprisingly,can afford even a high rate of change with a significant fraction of the population turning overeach step. Our framework relies on two properties for the underlying game, which are satisfied inmultiple applications (such as simultaneous multi-item auctions and routing games). The first isthat the game satisfies a variant of the smoothness property of [Rou09, ST13] which is crucial forthe extension theorems even with static population. The second property is that the underlyingoptimization problem satisfies a stability condition: there exists a sequence of β-approximatelyoptimal solutions that is not drastically changing; at a player turnover, the solution of few2

other players alters. As we explain below, we derive such stable approximately optimal solutionsequences via two techniques and apply them to our applications of interest. The first technique(Section 4) relies on greedy arguments, is related to the dynamic matching literature, and inducesa decay in performance of β = 0.5. The second technique (Section 5) draws a nice connection to

2At a random player’s arrival, at most a polylogarithmic number of other players is affected.

2

Page 4: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

the differential privacy literature and results to parametric degradation that tends to β → 1 asthe game becomes larger. Our framework is described in detail in Section 3.

Our main result is to prove an α · β · γ bound on the price of anarchy of the repeated game,assuming players’ strategy choices approximately satisfy the dynamic no-regret condition, thestage game satisfies the smoothness assumption with price of anarchy α for the one-shot game,the underlying optimization problem has a β-optimal sequence of stable solutions, and γ is aparameter allowing us to charge the player’s regret error as a small fraction of the obtainedsocial value, which gracefully degrades with the turnover probability, and goes to 1 as p → 0.A dependence on p is expected as, if the game is always abruptly and completely changing, wecannot expect to have stable solutions, and learning from the past cannot provide insightfulguidelines for the future.

Applying our framework to particular settings, we obtain guarantees for two important applica-tions: multi-item auctions and routing. Most of our results are about simultaneous multi-itemauctions (such as first-price or second-price). We provide three results with different strengths.

• Our first guarantee (Theorem 4.3) assumes that players are unit-demand and provides adynamic-population price of anarchy of 0.25(1−η) when the turnover probability is p . η3.This bound stems from using the dynamic-matching stability approach leading to β = 0.5,learning-based static-population price of anarchy guarantees of α = 0.5 (such as the onefor simultaneous second-price auctions [CKS16]) and can afford a turnover regret term ofγ ≈ 1 − 3

√p (up to logarithmic terms), allowing for close to a constant fraction of the

population turnover each time step.

• Our second guarantee (Theorem 6.2) allows players to have more complex valuations satis-fying the so-called gross substitutes condition but requires each item to come with a largesupply B; it provides a dynamic-population price of anarchy of 0.5(1−η) when the turnoverprobability is p . η5B

n . This bound uses the same static-population price of anarchy guar-antees of α = 0.5 but instead applies a differentially-private stability technique that leadsto an effective β = 1− η only when the turnover term is γ ≈ 1− η −

√pn/B. As a result,

the guarantee gives improvement when the market is thick, e.g., B = Θ(n) but can bemeaningless in small-supply settings where the first guarantee still applies.

• Our third guarantee (Theorem 6.3) provides asymptotic optimality 1−η (when the turnoverprobability is p . η5Bn ) and when there exists some uncertainty in the arrival of eachplayer at every round (each player does not show up with probability q even if they arein the game). With this additional assumption, we can use enhanced static-populationguarantees of α ≈ 1− 1√

q(1−q)B[FIL+16] while keeping the parameters β and γ as before.

As a result, when the market is sufficiently thick, the price of anarchy goes to 1 as theturnover probability p becomes small.

The second application is atomic unsplittable routing ; since this is a cost-minimization setting,all the approximation factors are above 1. For this class, the learning-based analysis [Rou09]provides static-population price of anarchy guarantees (e.g. α = 2.5 for affine latency functions[CK05]). Our guarantee (Theorem 6.6) extends this to a dynamic-population price of anarchyof 2.5 + η when p . η4/m10polylog(n,m) where m is the number of edges in the network, whenthe number of players is relatively large. Note that again this turnover probability dependsonly logarithmically on the number of players so allows for a nearly constant turnover when thenumber of players is relatively large.

Our techniques: To achieve the claimed welfare guarantees we assume that all players arelearning and take advantage of the power of no-regret learning. Classical no-regret learning

3

Page 5: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

guarantees that the learner achieves utility as good as any fixed strategy with hindsight. Withthe changing participants of a dynamic population game, the competitors over time can changedrastically, and participants need to adjust to the changing environment; a fixed strategy withhindsight is therefore not strong enough to ensure good outcomes. However, learning can dobetter: rather than a fixed single strategy as a benchmark, shifting regret can guarantee todo well against an occasionally changing benchmark [HW98, LS15]. As one would expect, theeffectiveness of learning degrades with the number of changes in the benchmark.

In proving welfare guarantees for learning in repeated games with fixed and unchanging set ofparticipants, traditional learning-based smoothness approaches [Rou15, ST13] rely on the factthat the set of players is static. In this case, the optimal solution is the same at each iterationand the strategy s⋆i associated with player i’s solution in this fixed optimal solution is therefore agood fixed benchmark; hence classical no-regret learning achieves at least as good performance.

In a dynamic population setting, the optimal solution changes as the player set evolves. Toguarantee high welfare in these evolving games, one idea would be to use the fact that, by thestrengthened shifting learning assumption players have no-regret against a shifting sequence ofs⋆,ti associated with the optimal solutions Opt

t at each time t. To understand whether thisis effective, we need to investigate how fast does this benchmark actually change as a functionof the turnover probability? Unfortunately, the sequence of optimal solutions can be highlyunstable for any player. Every time that a player turns over, the new optimal solution canchange drastically (e.g., for an assignment problem the change is along an augmenting path thatcan be arbitrarily long). With frequent changes in the benchmark, learning is ineffective (thelearning error dominates), so for learning to be effective against the benchmark of the optimalsolution, one would need to require that new players arrive extremely rarely.

To make the price of anarchy guarantees robust for a high turnover, we need to avoid comparingto the unstable optimal solution sequence and instead use a very stable benchmark, i.e. thatdoes not drastically alter with a type change of a single player. We offer two techniques toshow that many optimization problems have close-to-optimal stable solutions that can be usedas benchmarks. In Section 4 we show that for the assignment problem, the greedy algorithmcan be adapted to result in a stable solution. Using this greedy solution as a benchmark losesa factor of β = 0.5 in efficiency compared to the optimal allocation, however, this benchmarkallows for a close-to-constant fraction of the population to turn over every single round (p onlydepends logarithmically on the number of players) and still guarantee high social welfare.

In Section 5 we show an interesting connection between stable solution sequences and differentialprivacy, allowing guarantees where the extra decay β caused by the requirement of using astable benchmark parametrically degrades with some notion of largeness of the game. Jointdifferentially private algorithms [KPRU14] guarantee that, if one player’s input changes, theprobability that the output for other players changes is small. This is in accordance with ourstability requirement – the expected number of changes that a single turnover causes is small(since all players are affected with a small probability). We can therefore use the existence of suchjoint differentially private algorithms for multi-item allocation problems [HHR+16] and onlinerouting [RRUW15] to derive efficiency guarantees for the corresponding games that tend to theprice of anarchy α of the static-population game as the game becomes larger. Crucially, theresults again allow for a close-to-constant fraction of the population to turn over every singleround.

We should note that the benchmark solutions discussed here are never explicitly computed andthe players do not need to know anything about them or about the changes occurring in the game– the players just use their learning algorithms. The existence of such stable solution sequencesimplies that learning players have no regret against them and this allows to extend the efficiencyguarantees to dynamic population settings even with a large churn in the player set.

4

Page 6: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

1.2 Related work

Our work lies in the interplay of learning and game theory in dynamic environments. In whatfollows, we briefly discuss relevant related work in each area, and clarify the key differences.

Dynamic games. Dynamic games have a long history in economics, dynamical systems andoperations research, see for example the survey books of Başar and Oldser [BO98] and Van Long[VL10]. The classic approach of analyzing behavior in such dynamic games is to assume thatplayers have prior beliefs about their competitors and that their behavior will constitute a perfectBayesian equilibrium or refinements of it such as the sequential equilibrium of Fudenberg andTirole [FT91] or Markov perfect equilibrium of Maskin and Tirole [MT01]. The informationaland rationality assumptions that such equilibrium solution concepts impose on the players andtheir computational intractability casts doubt on whether players in practice and in complexgame-theoretic environments such as packet routing or internet ad-auctions would behave asprescribed by such an equilibrium behavior.

Equilibrium-like behavior might be more plausible in large game approximations (see e.g. thework of Kalai and Shmaya [KS15]). A natural approximation to equilibrium behavior in largegame situations that has been recently extensively analyzed in economics and in operationsresearch (and particularly, in auction settings) is that of the mean field equilibrium [WBVR08,AJ13, IJS14, AJW15, BBW15] and relaxations thereof, such as oblivious equilibrium [WBVR06,WBVR10]. These advances bring equilibrium theory much closer to practice and make morerealistic assumptions than tradition sequential equilibrium theory. Moreover, they allow for long-range dependencies in the player’s learning (such as budget constraints), which we do not considerhere (see also relevant discussion in subsequent paragraphs on learning in games). However, eventhese large game approximations require the players to form almost correct beliefs about thecompetition and to exactly best-respond to these approximate large-market beliefs. Moreover,the approach requires that the environment either is stochastically stable or evolves in a knownstochastic manner. On the contrary our dynamic-population model allows for adversarial changesand our analysis attempts to analyze even constantly evolving and never converging behavior.Moreover, our assumption that players satisfy a form of no-regret condition for adaptive learningdoes not impose that players possess or form any beliefs on the competition. Most of thealgorithms that achieve this guarantee only require that the player is able to see the utility thateach of her strategic options would have given in-retrospect, in past time-steps. Furthermore,our approach also applies to small markets although our results improve in the large-marketregime. Finally, we note that in the large market regime, prior work of [BG19], shows a strongconnection between no-regret learning and mean field equilibria, even when there are budgetconstraints (albeit in a stationary world). It is an interesting avenue for future work, whethersuch connections extend to the non-stationary setting we consider here.

There is a large literature on truthful mechanisms in a dynamic setting analogous to our dynamicplayer population model, where the goal is to truthfully implement a desired outcome withdynamically changing populations of users with private value. This line of work goes backto [PS03] in the computer science literature, but has been also considered much earlier withqueuing models by [Dol78]. In subsequent work, Cavallo et al. [CPS10] offer a generalizedVCG mechanism in an environment very similar to the one we are considering with departuresand arrivals, and also provides a nice overview of work in truthful mechanisms in a dynamicsetting; the reader is also referred to the survey on dynamic auctions by [BS10]. More recently,there has been significant work in understanding dynamic mechanism design in more complicatedsettings such as not knowing the buyer distributions in advance [MPLTZ18] or budget constraints[BBW19]. In contrast to these works, we do not design the mechanism but rather analyze theoutcomes of fixed broadly used mechanisms such as first and second price auctions, or other

5

Page 7: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

games such as routing.

Learning in games. In recent years, learning received attention as a behavioral assumptionin games especially with the rise of online advertising which, as discussed before, makes itsdeployment widespread and effective. There is a large literature analyzing learning in games,dating back to the work on fictitious play by Brown and Robinson [Bro51, Rob51]. For anoverview of this area, the reader is referred to the books of Fudenberg and Levine [FL98] andCesa-Bianchi and Lugosi [CBL06]. A standard notion of learning in games is that of no-regretlearning. The notion of no-regret against the best fixed action in hindsight dates back to thework of Hannan [Han57], and is also referred as Hannan consistency. There are many learningalgorithms achieving this guarantee such as regret matching [HMC00] and multiplicative weightsupdates [FS97]. In the context of learning to bid in repeated auctions, these learning algorithmswork well when players are bidding only on a few items each round (even if there may be manypossible items to bid on), but finding good learning strategies is computationally hard whenapplied to general multi-item auctions [CP14, DS16]. In our auction results, we assume thatthere is a limit in the number of items that any player can bid on which makes learning effective.

Most efficiency guarantees originally developed for Nash equilibria extend to static-populationgames via the smoothness framework [Rou15, ST13]. Our work extends this to more realisticsettings that can be drastically evolving over time.

Another line of work involves mechanism design questions when the participants in the setting useno-regret learning algorithms [BMSW18, DSS19]; these papers suggest that classical no-regretlearning against a fixed benchmark can be easily attacked by revenue-maximizing auctioneers,further supporting the need for more sophisticated learning algorithms such as the ones we studyin this work.

Learning in dynamic games. The notion of no-regret learning against time-varying bench-marks, as opposed to fixed actions, traces back to Herbster and Warmuth [HW98] who providedguarantees compared to the best sequence of k experts where k is known in advance. The strongernotion of adaptive regret which we consider in this work was formalized by Hazan and Seshadhri[HS07] and near-optimal adaptive regret guarantees were achieved through a series of algorithms[Leh03, BM07, CBGLS12, LS15] (we further discuss this in Section 2.2). One important trait ofthese algorithms is that they display some sort of recency bias, in the sense that the influence ofpast steps decays as time goes by; this is also supported by experimental evidence [FP14], whichsuggests that humans display such forms of recency bias when making repeated decisions.

Finally, subsequent to the preliminary version of this paper, Foster et al. [FLL+16] showed thatthe efficiency guarantees we present in this paper hold under a weaker behavioral assumption.The adaptive learning assumption we analyze in this paper is one way to achieve low regretagainst an adaptive benchmark with unknown number of changes in the comparator sequence.Foster et al. [FLL+16] show that a multiplicative relaxation of the regret notion similar to theone used by Blum and Mansour [BM07], suffices for the efficiency guarantees we prove, and issatisfied by minor modifications of most no-regret algorithms. The latter strengthens the resultsof this paper by weakening the required behavioral assumption since our efficiency guaranteeshold even if players can select algorithms from a broader class. In the remainder of the paper,we use the original adaptive learning behavioral assumption and refer the reader to [FLL+16] forfurther discussion on the weaker assumption.

6

Page 8: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

2 Preliminaries

In this section, we formalize the model we consider, provide background from learning theoryand game theory that will be essential in our results, and discuss the applications we consider.

2.1 Model

Games and mechanisms. We consider game settings played repeatedly, where the populationof players is evolving over time. Let G be an n-player normal form stage game and assumethat game G is played repeatedly for T rounds.3 Each player i has a strategy space Si, withmaxi |Si| = N , a type vi ∈ Vi and a cost function ci(s; vi) that depends on the strategy profiles ∈ ×iSi, and on her type. We denote with C(s;v) =

∑i∈[n] ci(s; vi) the social cost, where s is

a strategy profile and v a type profile. We also analyze the case when the stage game is a utilitymaximization mechanism M , which takes as input a strategy profile and outputs an allocationXi(s) for each player and a payment Pi(s). We assume that players have quasi-linear utilityui(s; vi) = vi(Xi(s))− Pi(s) and the welfare is the sum of valuations (sum of utilities of biddersand revenue of auctioneer): W (s;v) =

∑i∈[n] vi(Xi(s)).

In all the settings we study, one can define the optimal solution of the underlying optimizationproblem. Let X n be the “feasible solution space” of the setting disregarding incentives. In networkcongestion games this corresponds to the set of feasible integral flows, while in combinatorialauctions it is the set of feasible partitions of items to bidders. We overload the social cost andwelfare notations and, for a feasible solution (or allocation) x ∈ X n, use C(x;v) and W (x;v) todenote the social cost or welfare of the solution4. We denote the optimal social cost or welfare fora type profile v, as Opt(v) = minx∈Xn C(x;v) and Opt(v) = maxx∈Xn W (x;v) respectively.

Repeated games and mechanisms with dynamic population. We introduce the studyof repeated game settings when the population of the player evolves over time. Our model isformalized in the following definition.

Definition 2.1 (Repeated game/mechanism with dynamic population). A repeated game with

dynamic population consists of a stage game G played for T rounds. Let P t denote the set ofplayers at round t, where each player i ∈ P t has a private type vti . After each round, every playerindependently exits the game with a (small) probability p > 0 and is replaced by a new player withan arbitrary type. The utility of players is additive across rounds. This repeated game is denotedby Γ = (G,T, p); similarly M = (M,T, p) denotes a corresponding mechanism.

Our model of dynamic population assumes that after each round every player independentlyexits the game with a turnover probability p > 0; each player is expected to participate in 1/prounds. To keep our model simple, we assume that when a player exits, she is replaced by a newparticipant. This assumption guarantees that there are exactly n players in each iteration, witha p fraction of the population changing each round in expectation. We make no assumption onthe types of the new arriving players which can be selected adversarially. Most of our results canbe extended assuming only an analogous bound on the expected number of departures, allowingdepartures to be correlated or even chosen adversarially.

To simplify notation, we use player i to denote the current i-th player, where this player isreplaced by a new i-th player with probability p each round. An alternate view of the dynamic

3Although the game rules remain unaltered across rounds, the changes in participants affect the payoff matrix.4Overloading the notation does not create ambiguity between C(x, v) and C(s, v), assuming the strategy sets

Si are disjoint from the possible solutions X .

7

Page 9: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

player population is to think of players as changing types after each iteration with a smallprobability p. We refer to such a change as player i switches or turns over.

Basic notation. For any quantity x we denote with x1:T the sequence x1, . . . , xT . For instance,v1:Ti denote the sequence of types of player i produced by the random choice of departing playersand the choices of the adversary regarding arriving players.

2.2 Background from learning theory and game theory

Adaptive Learning in Dynamic Environments. No-regret learning provides guidance onhow to decide in an online manner across a set of alternatives. Consider an arbitrary loss vectoracross the possible actions. For a cost-game, the loss of an action corresponds to the cost theplayer incurs if they follow this action; for a utility game, the loss is the difference between themaximum possible utility and the player’s utility. Consider a player who has N possible actions.In defining regret and no-regret learning, we focus on a single player and hence temporarily dropthe index i for the player from the notation. We use ℓ(s, t) to denote the loss (cost or lost utility)of the player if she plays strategy s at round t. We assume without loss of generality that ℓ(s, t)is a value in [0, 1], but make no further assumption about the sequence of losses. Note that evenwith a stable set of players the value ℓ(s, t) may vary over time, depending on the strategieschosen by other players. A player achieves no-regret if she does at least as well over a period oftime as the best choice s⋆ with hindsight.

Definition 2.2 (Regret). The regret of a strategy sequence s1:T is defined as:

R(1, T ) = maxs⋆

T∑

t=1

(ℓ(st, t)− ℓ(s⋆, t)

)

There are many simple algorithms (for example, see the book of Cesa-Bianchi and Lugosi[CBL06]) that achieve regret O(

√T log(N)) against any (adversarial) sequence of loss values

ℓ(s, t).

In dealing with changing environments, we require a stronger assumption on the learning of theplayers, since the players need to adapt their strategies to the environment. We use the notion ofadaptive regret introduced by Hazan and Seshadhri [HS07] which captures the regret over (long)time intervals [τ1, τ2), in addition to the regret of the whole sequence.

Definition 2.3 (Adaptive Regret). The adaptive regret of strategy sequence s1:T in the timeinterval [τ1, τ2) is defined as:

R(τ1, τ2) = maxs⋆

τ2−1∑

t=τ1

(ℓ(st, t)− ℓ(s⋆, t)

)

We assume that all players use learning algorithms that achieve low adaptive regret. For simplic-ity of presentation we assume that, for a constant CR, E[R(τ1, τ2)] ≤ CR

√(τ2 − τ1) ln(Nτ2) and

refer to this assumption as “The players use adaptive learning algorithms with constant CR”.5

This assumption can be achieved by the algorithms of Luo and Schapire [LS15], or Daniely,Gonen, and Schalev-Shwartz [DGSS15]. In fact, any algorithm that achieves a sleeping expertsregret guarantee [Blu97, FSSW97, BM07] automatically satisfies this assumption as noted by Luoand Schapire [LS15]. For simplicity of presentation, we use CR = 1 and refer to the behavioralassumption as “The players use adaptive learning algorithms”

5Our results smoothly degrade if we instead assume that players achieve adaptive regret that is some othersublinear concave function of the interval’s length (τ2 − τ1).

8

Page 10: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Solution-based Smoothness in Games and Mechanisms. Smooth games were introducedby Roughgarden [Rou09] as a general framework bounding the price of anarchy in games. He alsoshowed that smoothness based price of anarchy bounds extend to outcomes in repeated gameswhere the set of players is fixed throughout the period and all players use no-regret learning.

We require a somewhat more general variant of smooth games, that compares the cost or utilityresulting from a strategy choice to the social welfare of a specific solution, rather than comparingto the social optimum. For two strategy vectors s and s

⋆ we use (s⋆i , s−i) to denote the vectorwhere player i uses strategy s⋆i and all other players j use their strategy sj .

Definition 2.4 (Solution-based smooth game). A cost-minimization game G is solution-based(λ, µ)-smooth for λ > 0 and µ < 1, if for any type profile v, for any feasible solution x ∈ X n

and any player i, there exists a strategy s⋆i ∈ Si depending only on her type vi and her part of thesolution xi such that for any strategy profile s

i

ci(s⋆i (vi, xi), s−i; vi) ≤ λC(x;v) + µC(s;v)

The definition of smoothness in games used in the conference paper [Rou09] required that forany pair of strategies s and s

⋆ we have∑

i ci(s⋆i , s−i; vi) ≤ λC(s⋆;v) + µC(s;v). This definition

turned out to be too strong, and for example it fails in the important application of GeneralizedSecond Prize Auction (GSP) [PLT10]. The definition used in the journal version [Rou15] andsince then, requires only the existence of a strategy vector s

⋆ satisfying∑

i ci(s⋆i , s−i; vi) ≤

λOpt(v) + µC(s;v), effectively requiring the above condition only for a strategy vector s⋆

associated with the the optimal solution x⋆. Our definition above is between the two: requires

the condition to hold for a strategy vector s⋆(v;x) associated with any solution x, not only the

optimal one. Note that, when x is the optimal solution, we recover the traditional examples ofsmooth games (including GSP), as the deviating strategy s⋆ usually depends on other players’types through her part of the optimal solution x⋆i (v).

A game that is solution-based (λ, µ)-smooth then it is also (λ, µ)-smooth in the sense of [Rou09](using solution-based smoothness for the optimal solution x

⋆(v)), and the game has price ofanarchy bounded by λ/(1−µ), and the average social cost of no-regret learning outcomes is alsobounded by λ/(1 − µ)Opt. More generally, for static-population games:

Proposition 2.1 (slightly extending [Rou15]). If a game is solution-based (λ, µ)-smooth andplayers have regret at most

√T for the strategy s⋆i (vi, xi) corresponding to the solution x ∈ X n,

the average expected cost is at most λ1−µ

(C(x;v) + n√

T

).

To familiarize the reader with the smoothness analysis, we provide the proof of both this propo-sition and the next one (Proposition 2.2) in Appendix A.

Observe that, when T ≥(

nǫ·C(x;v)

)2, the above bound becomes λ(1+ǫ)

1−µ ·C(x;v). Note also that it

was crucial that strategy s⋆i was fixed across rounds since we used that player i does not regret notdeviating to it. Since s⋆i is a function of the underlying optimization problem, this is no longerthe case in an evolving population as departures of players change the underlying optimizationproblem and may affect s⋆i . In Section 3, we show how the stronger behavioral assumption ofadaptive learning can help extend the aforementioned analysis.

The main result of our paper is about mechanisms. For mechanisms we use the version of thesmoothness definition of Syrgkanis and Tardos [ST13] which assumes that players have quasi-linear utilities. We again define a mechanism to be solution-based smooth and allow the choiceof strategy s⋆ to depend only on the player’s part of the solution xi and her type vi. Moreformally:

9

Page 11: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Definition 2.5 (Solution-based smooth mechanism). A mechanism M is solution-based (λ, µ)-smooth for λ, µ ≥ 0, if for any valuation profile v, any solution x ∈ X n, and any player i, thereexists a deviating strategy s∗i ∈ Si depending only on vi and xi such that for any strategy profiles, ∑

i

ui(s∗i (vi, xi), s−i; vi) ≥ λW (x;v)− µR(s).

where R(s) =∑n

i=1 Pi(s).

Syrgkanis and Tardos [ST13] proved that a (λ, µ)-smooth mechanism has price of anarchybounded by max(µ, 1)/λ, and the average social welfare of no-regret learning outcome is also atleast (λ/max(µ, 1))Opt(v). Analogously, for static-population mechanisms, we obtain:

Proposition 2.2 (slightly extending [ST13]). If a mechanism is solution-based (λ, µ)-smoothplayers have regret at most

√T for a strategy s⋆i (xi, vi) for a solution x ∈ X n, then the average

expected social welfare is at least λmax(µ,1)

(W (x;v) − n√

T

).

2.3 Applications

We consider a welfare-maximization mechanism (simultaneous multi-item auctions) and a cost-minimization game (atomic congestion game) as the main application of our techniques.

Simultaneous multi-item auctions. The auction settings we consider include repeated auc-tions of m distinct item types, each of which appears with a limited supply B (of identicalcopies); the total number of items is therefore m = m · B. In Section 4, we assume that thesupply is B = 1 (identical copies are treated as distinct items) while in Section 5, we assumethat we have access to a larger supply B. In all these settings, each player submits bids for eachitem type and the highest bidder receives the item (ties are broken arbitrarily). The settingsdiffer by the payment that this player ends up paying: in first price-auctions she pays her bid, inthreshold-price auctions she pays the minimum price that would have her win the item (secondbid when B = 1 and more generally, (B + 1)-th bid for general B), while in hybrid auctionsshe pays a combination of the two. The results of Section 4 apply to any of these settings andare provided for simplicity for first-price auctions, while the results of Sections 5 and 6 requiresmoothness results that only hold for threshold-price auctions so only apply in this case. For ourasymptotically optimal efficiency (Section 6.2), we also assume some uncertainty in participationof each existing player at every round similar to the work of Feldman et al. [FIL+16].

Each player is assumed to have a valuation function for each set of items she obtains. Valuationsare additive across time which models perishable items such as advertising opportunity where aplayer aims to repeatedly win items in each period she participates. For each round t, we usevti(A) to denote the valuation of the i-th player if she obtains set A ⊂ [m]. For all our results,we assume that the valuations are non-negative and normalized to be at most 1; we also assumethat values are expressed as multiple of ρ > 0; in particular, the minimum non-zero value for anitem is ρ. In Section 4, we assume that the players are unit-demand (their value is equal to themaximum value across items they obtained). In Section 5 we consider players in large marketswith more complex valuations (satisfying the gross substitute property, formally defined in thatsection).

Finally, since the learnability of the setting depends on the number of strategies available to theplayer, we restrict the players to bid to at most d item types at each round and discretize theirbids to multiples of ζρ for a small constant ζ > 0. As a result, the number of strategies available

10

Page 12: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

to the learner is equal to N =(md

)(1ζρ

)d≤(

mζρ

)d. For ease of presentation, in Section 4, we

assume that d = 1.

Atomic congestion game. In the atomic congestion game, we assume that we have a setof congestible elements E (and let m = |E|), each element e has a latency function ℓe(x), orcost, that is monotone non-decreasing in the congestion x. Given some selection of sets si ⊆ Efor each player i, the congestion in an element e is the number of players that have selected it:xe(s) = |i : e ∈ si|, and the cost of player i is then the sum

∑e∈si ℓe(xe(s)).

A player’s type vti denotes the possible subsets of the element set she can select. For example, inthe routing game on a graph, the type of a player i is a source-sink pair (oi, di), and her strategyis the choice of a path from oi to di in the graph. We assume that a player’s cost is infinity ifher solution is not one of the selected sets. Thus the number of strategies available to the playeris the number of (oi, di), paths in the graph and thereby N is the maximum number of suchpossible paths across possible source-sink pairs.

3 Price of Anarchy with Dynamic Population via Stable Sequences

In this section, we provide our general efficiency theorems both for games and mechanismswith dynamic population. Specifically, we formalize the connection between adaptive learning,solution-based smoothness of the game, and the existence of approximately-optimal and stable se-quences for the underlying optimization problem. The efficiency theorems serve as meta-theoremsthat reduce the quest for efficiency guarantees with dynamic population to the existence of ap-proximately optimal stable solution sequences. Note that such stable sequences just need toexist, are only used in the proofs, they do not need to be known by the players, or even beconstructed. In the following sections, we instantiate this meta-theorem to specific applicationsby showing how one can derive such stable approximately optimal sequences from greedy algo-rithms (Section 4) as well as from joint differentially private algorithms in the case of large games(Section 5).

3.1 Efficiency theorem for cost-minimization games with dynamic population

We first provide the efficiency theorem for cost-minimization games by extending the proof ofProposition 2.1 to account for the dynamic population. Although our main applications areabout mechanisms, we begin with cost-minimization games for pedagogical purposes as theproof is simpler in this case. In Section 6.3, we apply this theorem to the application of atomiccongestion games (such as routing). Before presenting our efficiency theorem, we formalize thenotion of stability we require from solution sequences of the underlying optimization problem.

Definition 3.1 (k-stable sequence for cost-minimization games). A randomized sequence ofsolutions x

1:T and types v1:T is k-stable for games if the cumulative (across players) expected

number of changes in each individual player’s solution or type is at most k. More formally, ifki(v

1:Ti , x1:Ti ) is the number of times that player i has xti 6= xt+1

i or vti 6= vt+1i , then:

n∑

i=1

E[ki(v1:Ti , x1:Ti

)]≤ k

where the expectation is taken over the randomness in the sequence of types, as well as theassociated solution sequence.6

6Note that the solution sequence x1:T necessarily depends on the sequence of player types v

1:T .

11

Page 13: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Theorem 3.1 (efficiency theorem for cost-minimization games). Consider a repeated cost gamewith dynamic population Γ = (G,T, p), such that the stage game G is solution-based (λ, µ)-smooth and costs are bounded in [0, 1]. Suppose that there exists a k-stable sequence (v1:T ,x1:T )such that x

t is feasible (pointwise) and β-approximately (in-expectation) optimal for each t, i.e.E[C(xt;vt)] ≤ β ·E[Opt(vt)]. If players use adaptive learning algorithms then:

t

E[C(st;vt)] ≤ λβ

1− µ

t

E[Opt(vt)] +n

1− µ

√T ·(k

n+ 1

)· ln(NT )

Proof. In a dynamic population game, the underlying optimization problem is changing over time.Therefore the classical smoothness analysis described in the proof of Proposition 2.1 requires astronger learning property. To deal with this, we define time-dependent deviating strategiesthat are related to the underlying optimization problem of the round. We then use the adaptivelearning property to show that the players do not regret any sequence of such deviating strategies.

Let s⋆,ti be the deviation s⋆i (vti , x

ti) defined by the smoothness property and s⋆,1:Ti be the sequence

of these deviations. Let Ki be the number of rounds that s⋆,ti 6= s⋆,t+1i and ri(s

⋆,1:Ti , s1:T ;v1:T )

the regret that player i has compared to selecting s⋆,ti at every round, i.e.:

ri(s⋆,1:Ti , s1:T ;v1:T ) =

T∑

t=1

(ci(s

t;vt)− ci(s⋆,ti , st−i;v

t)). (1)

For shorthand, we denote this with r⋆i in this proof. Observe that since s⋆,ti is uniquely determinedby vti and xti, Ki is a random variable that is equal to the ki(v

1:Ti , x1:Ti ) of Definition 3.1, for each

instantiation of the sequences v1:T and x

1:T .

Let τi,r be the round that the strategy s⋆,ti of player i changes for the r-th time. For any period[τi,r, τi,r+1) that the strategy s⋆,ti is fixed, adaptive learning guarantees that the player’s regretfor this strategy is bounded by

Ri(τi,r, τi,r+1) ≤√

(τi,r+1 − τi,r) ln(NT ), (2)

Summing over the Ki+1 periods [τi,r, τi,r+1) in which the strategy for player i is fixed and usingthe Cauchy-Schwartz inequality, we can bound the total regret of each i:

r⋆i ≤

√√√√(Ki + 1)

Ki+1∑

r=1

(τr+1 − τr) ln(NT ) =√

(Ki + 1)T ln(NT ), (3)

Thus for each instantiation of x1:T and v1:T , we have:

T∑

t=1

ci(st;vt) =

T∑

t=1

ci(s⋆,ti , st−i;v

t) + r⋆i ≤T∑

t=1

ci(s⋆,ti , st−i;v

t) +√

(Ki + 1)T ln(NT ), (4)

Adding over all players, and using the smoothness property, we get that∑

t

C(st;vt) ≤ λ∑

t

C(xt;vt) + µ∑

t

C(st;vt) +∑

i

√(Ki + 1)T ln(NT ).

By Cauchy-Schwartz,∑

i

√(Ki + 1)T ln(NT ) ≤

√n · T · ln(NT ) ·∑n

i=1(Ki + 1). Taking expec-tation over the allocation and valuation sequence and using the β-approximate optimality andJensen’s inequality:

∑tE[C(st;vt)] ≤ λβ

∑tE[Opt(vt)] + µ

∑tE[C(st;vt)] + n ·

√T ln(NT )

(1 + 1

n

∑ni=1 E[Ki]

).

By the k-stability of the sequence, we have that∑n

i=1 E[Ki] ≤ k. By re-arranging we obtain theclaimed bound.

12

Page 14: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Remark 3.1. The above result smoothly degrades if the players use adaptive learning algorithmswith worse guarantees. More concretely, if their adaptive regret guarantee for time interval [τ1, τ2)is: E[R(τ1, τ2)] ≤ CR(τ2− τ1)

q for some constant CR and some q ∈ [0, 1], the theorem would stillhold with the last term replaced by n

1−µ · CRTq(k + 1)1−q. This can be proved by replacing the

use of Cauchy-Schwartz inequality in the proof with the more general Hölder inequality and thesame technique can be also applied to all our later results. For ease of presentation, we limit thediscussion about use of weaker adaptive learning algorithms to this remark.

3.2 Efficiency theorem for mechanisms with dynamic population

We now describe the efficiency meta-theorem for mechanisms with dynamic population thatwe use in our main application: simultaneous multi-item auctions (Sections 4 and 6.1). Thismeta-theorem again reduces proving efficiency guarantees of learning outcomes with dynamicpopulation to showing existence of stable and approximately optimal solution sequences. Since,in mechanisms, the relevant players for a solution are those that are allocated a resource in thecorresponding allocation, we first define an allocation-based analog of the stability definition 3.1.If a player i is not allocated any resource in an allocation x, we denote its allocation as the emptyallocation: xi = ∅.Definition 3.2 (k-stable sequence for mechanisms). A randomized sequence of allocations x

1:T

and types v1:T is k-stable for mechanisms if the cumulative (across players) expected number of

changes in the allocation or type of individual players with nonempty allocation is at most k, i.e.,if κi(v

1:Ti , x1:Ti ) is the number of times that player i has xti 6= xt−1

i or (xti 6= ∅ and vti 6= vt−1i )

then:n∑

i=1

E[κi(v1:Ti , x1:Ti

)]≤ k

Observe that unlike the definition of ki(v1:Ti , x1:Ti ) in Definition 3.1, κi(v1:Ti , x1:Ti ) does not ac-count for changes in the type of players that are not currently allocated an item in solutionxti.

Before providing the efficiency meta-theorem for mechanisms, we stress a property that needs tobe satisfied by the mechanism for the efficiency guarantee to hold. This asks that a player whois not allocated any item is getting zero utility, and that players always bid in a way that givesthem nonnegative utility. This holds in most auction formats such as first-price, second-priceand hybrid auctions under a no-overbidding-assumption. 7

Assumption 3.3. Any player i who is not allocated a resource has zero utility, i.e. ui(s∗i (vi, ∅), s−i) =

0; the utility is also always nonnegative, i.e. ui(s; vi) ≥ 0 for any strategy used by the players.

This assumption posits that players with no items in the feasible allocation will have no regretagainst deviating strategy that attempts to “win” the empty allocation. We now provide themeta-theorem for mechanisms.

Theorem 3.2 (efficiency theorem for mechanisms). Consider a repeated mechanism with dy-namic population M = (M,T, p), such that the stage mechanism M is solution-based (λ, µ)-smooth, satisfies Assumption 3.3, and utilities are in [0, 1]. Suppose that there exists a k-stable sequence (v1:T,x1:T) such that x

t is feasible (pointwise) and β-approximately optimal(in-expectation) for each t, i.e. β · E[W (xt;vt)] ≥ E[Opt(vt)]. If players use adaptive learning

7A notable exception to the property is any all-pay auction as losing bidders pay their bid and therefore obtainnegative utility despite not being allocated anything.

13

Page 15: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

algorithms then:

t

E[W (st;vt)] ≥ λ

βmax1, µ∑

t

E[Opt(vt)]−√

T ·m · (k +m) · ln(NT )

where m is such that for any feasible allocation x, |i : xi 6= ∅| ≤ m.

Proof. Let s⋆,1:Ti be defined exactly as in the proof of Theorem 3.1 and ri(s⋆,1:Ti , s1:T ;v1:T ), or

r⋆i as shorthand, be defined similarly as:

ri(s⋆,1:Ti , s1:T ;v1:T ) =

T∑

t=1

(ui(s

⋆,ti , st−i;v

t)− ui(st;vt)

)

For any period [τr, τr+1) that the strategy s⋆,ti is fixed, adaptive learning guarantees that theplayer’s regret for this strategy is bounded by

Ri(τr, τr+1) ≤√

(τr+1 − τr) ln(NT ),

Moreover, if in period r, xti = ∅, then by Property 3.3 we have that: Ri(τr, τr+1) ≤ 0. Thus, ifwe denote with Xi,r the indicator of whether in period r, xti = ∅, we obtain:

Ri(τr, τr+1) ≤√

X2i,r(τr+1 − τr) ln(NT ),

Summing over the Ki + 1 periods in which the strategy is fixed and using the Cauchy-Schwartzinequality, we can bound the total regret of each i:

r⋆i =

Ki+1∑

r=1

√Xi,r ·

√Xi,r(τr+1 − τr) ln(NT ) ≤

√√√√Ki+1∑

r=1

Xi,r ·

√√√√Ki+1∑

r=1

Xi,r(τr+1 − τr) ln(NT )

Let Y ti = 1xt

i 6=∅. Then observe that:

Ki+1∑

r=1

Xi,r(τr+1 − τr) =T∑

t=1

Y ti .

Replacing in the previous inequality, summing over all players and using Cauchy-Schwartz:

n∑

i=1

r⋆i ≤n∑

i=1

√√√√Ki+1∑

r=1

Xi,r ·

√√√√T∑

t=1

Y ti ln(NT ) ≤

√√√√n∑

i=1

Ki+1∑

r=1

Xi,r ·

√√√√n∑

i=1

T∑

t=1

Y ti ln(NT )

Since each xt is a feasible allocation:

∑ni=1 Y

ti ≤ m. Hence,

∑ni=1

∑Tt=1 Y

ti ≤ mT . Moreover:

n∑

i=1

Ki+1∑

r=1

Xi,r ≤n∑

i=1

Ki∑

r=1

Xi,r +

n∑

i=1

Xi,Ki+1 =

n∑

i=1

Ki∑

r=1

Xi,r +

n∑

i=1

Y Ti ≤ m+

n∑

i=1

Ki∑

r=1

Xi,r

Now observe that for each instance of (v1:T , x1:T ):∑Ki

r=1Xi,r ≤ κi(v1:T , x1:T ), since the latter

summation sums all changes in type or allocation ranging from r = 1 to Ki, such that theallocation xτri in the period right before the r-th change is non-empty. This is at most the set ofchanges that are accounted in κi(v

1:T , x1:T ). It is an inequality as there could be an index r atwhich both a type and an allocation is changing and the summation only accounts it once, while

14

Page 16: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

κi(v1:T , x1:T ) counts it twice, or there could be changes where xti 6= xt−1

i and xti 6= ∅, which arenot accounted in the above, but are accounted in κi(v

1:T , x1:T ). Combining all the above we get:

n∑

i=1

r⋆i ≤

√√√√m+

n∑

i=1

κi(v1:T , x1:T ) ·√

mT ln(NT )

By the no-regret property of each player, for each instance of x1:T and v1:T , we have:

T∑

t=1

ui(st;vt) ≥

T∑

t=1

ui(s∗,ti , st−i;v

t)− r⋆i

Adding over all players, and using the smoothness property and the bound on the sum of regrets,we get that

t

i

ui(st;vt) ≥ λ

t

W (xt;vt)− µ∑

t

R(st)−

√√√√m+n∑

i=1

κi(v1:T , x1:T ) ·√

mT ln(NT )

Taking expectation over the allocation and valuation sequence and using the α-approximateoptimality and Jensen’s inequality:

t

i

E[ui(st;vt)] ≥ λ

β

t

E[Opt(vt)]−µ∑

t

E[R(st)]−

√√√√m+

n∑

i=1

E[κi(v1:T , x1:T )]·√

mT ln(NT ).

By re-arranging and using the fact that W (st;vt) =∑

i ui(st; vt) + R(st) and that R(st) ≤

W (st;vt) (since utilities are non-negative by Assumption 3.3), we obtain the claimed bound.

Remark 3.2. The bounds presented in Theorems 3.1 and 3.2 have a logarithmic dependence onthe time horizon T , which makes them ineffective when the overall time T is very long. However,this can be replaced by a logarithmic dependence on Tmax at the expense of having max(p, 1/Tmax)instead of just p in the bounds, assuming players achieve regret that is no worse than algorithmsthat restart learning after at most Tmax rounds; with Tmax ∼ 1/p then log(T ) can be replaced bylog(1/p).

4 Stable Sequences via Greedy Algorithms

In this section we offer direct arguments to show the existence of stable solution sequences andhence good efficiency results for the case of simultaneous first-price auctions with unit-demandbidders. Our method is based on a combination of using the greedy algorithm and rounding theinput parameters. We consider a repeated mechanism with dynamic population M = (M,T, p),where the stage mechanism consists of simultaneous first price auctions with n unit-demandbidders (matching markets) and m items. Recall from Section 2.3 that the value of each bidderfor any item is either 0 or at least a constant ρ > 0 (for example, a cent of a dollar) and thatthe bidders are only allowed to bid multiples of ζρ for a small constant ζ > 0 (so the biddingspace is discrete). To apply our main theorem, Theorem 3.2, we need two properties: i) thatthe mechanism is solution-based smooth, and ii) that there exists a relatively stable sequence ofapproximately optimal solutions for the optimization problem.

15

Page 17: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

4.1 Stability via layered greedy algorithm.

In order to be able to apply our meta-theorem from the previous section, we need to also showthat there exists a sequence of approximately optimal solutions that is relatively stable to changesin the population. When bidders are unit-demand then the underlying optimization problem isfinding a matching of maximum value between bidders and items. Unfortunately, the optimalalgorithm is not stable to changes in the population since a single change in one player’s valuationmay the allocation of all other players in the optimal solution as the new optimal solution maycome from an augmenting path.

To obtain a stable and approximately optimal solution sequence, we use a layered version of thegreedy algorithm. The greedy matching algorithm initially does not allocate any item to anyplayer, and subsequently considers item valuations vti(j) in decreasing order and assigns item jto player i if, when vti(j) is considered, neither item j nor player i are matched. To make thisalgorithm more stable we define the layered-greedy matching algorithm, which works as follows.Recall that ρ > 0 denotes the smallest non-zero value that a player has for any item. For apositive ǫ ≤ 1/3, we round each player’s value down to the closest number of the form ρ(1 + ǫ)ℓ

for some integer ℓ, and run the greedy algorithm with these rounded values. It is well knownthat the greedy algorithm guarantees a solution that is within a factor of 2 to optimal. We losean additional factor of (1 + ǫ) by working with the rounded values.

The greedy algorithm may have many ties and we resolve ties in a way to make the output stable.In particular, among all the player-item pairs with the same rounded value at round t, we breakties in favor of pairs that were matched in the previous round t− 1. Note that again this neitheraffects anything in the mechanism nor makes any assumption on how the mechanism breaksties, it is just inside the proof to show the existence of an approximately optimal stable solutionsequence.

We now provide the key ingredient for our efficiency guarantee, a lemma showing that the abovealgorithm provides a sequence of approximately optimal and stable solutions; and therefore sucha sequence exists for the underlying optimization problem. We will use the stability definitionfor mechanisms (Definition 3.2).

Lemma 4.1 (Stability via layered-greedy matching). For any ǫ > 0, there exists a sequence ofsolutions for the underlying matching problem such that a) the total welfare at any round isnear-optimal: W (xt; vt) ≥ 1−ǫ

2 Opt(vt) and b) the sequence is k-stable for mechanisms (as inDefinition 3.2) with k = 2m · (1 + ·p · T ) · log(1+ǫ)(1/ρ).

Proof. The solutions in the theorem come by applying the layered-greedy matching algorithmwith parameter ǫ > 0 (to be set in the end of the section). The approximation result holds aswe lose a factor of 2 due to the greedy algorithm and a factor of (1 + ǫ) due to the layers. Since1

1+ǫ > 1− ǫ, result (a) follows.

To show the stability let ℓ(vti(j)) be the highest integer ℓ such that vti(j) ≥ ρ(1 + ǫ)ℓ−1, i.e., therounded down version of vti(j) is ρ(1 + ǫ)ℓ(v

ti (j))−1, which we call the layer of this value. For

example, any value in the range [ρ, ρ(1 + ǫ)) is in layer 1. Let ℓt(j) denote ℓ(vti(j)) if item j isassigned to player i at time t, and let ℓt(j) = 0 if item j is not assigned at time t. We use thepotential function

Φ(xt) =∑

j

ℓt(j)

to show stability. As all values are upped bounded by 1, the number of possible values that thepotential function can take is m log(1+ǫ)(1/ρ).

16

Page 18: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

The crux of the proof relies in the following steps. We first show that changes in assignments ofnon-departing players correspond to increases in the potential function. Moreover, the potentialfunction can only decrease due to departures: when a player assigned to item j leaves at time t,this immediately decreases the potential function by ℓt(j) ≤ log(1+ǫ)(1/ρ). Ignoring changes dueto one round of increasing the potential, the aggregate increase in the potential function is thesame as the aggregate decrease. Hence, the expected number of changes is upper bounded bybounding the expected decrease in the potential function. This argument is formalized below.

The allocation of the solution can change only due to a turnover: either a player that holdsan item in the greedy solution departs and leaves her previously assigned item open for currentplayers, or a new player arrives and gets assigned to an item. Every time that a player thatholds an item departs, she leaves that item temporarily free in the greedy solution. Unless thisis the last time that the item has a holder in the greedy solution, at some point (either at thesame round or in the future), some player i will get assigned to this item. Due to the layeredversion of the greedy algorithm, this increases the potential function by at least 1. If player i waspreviously assigned to another item, this item becomes free and some other player may move tothat item by increasing again the potential function by at least 1. As a result, the total increasein the potential function, caused by someone getting an unassigned item is associated to at most2 changes. The case where the item remains unassigned for all the future rounds, contributes intotal at most m extra changes in allocation (over the whole time horizon).

Now consider the scenario where an arriving player misplaces another player from the greedysolution (instead of getting an unassigned item). For this to happen, her rounded value ishigher than the one of the current owner; hence, the potential function increases by 1. Thisaffects the allocation of the previous holder of the item who may either cease being allocated orreplace a player in another item again increasing the potential function by at least 1 (and thismay propagate across subsequently misplaced players as well). As a result, each increase in thepotential function that is caused by someone getting a previously assigned item contributes atmost 2 changes in the allocation of other players.

Combined with the previous point, the total number of changes is at most:∑

i

Ki(x1:T ) ≤ 2 · Total Increase in Φ+m (5)

Since the potential function can only get integer non-negative values and is bounded by m ·log(1+ǫ)(1/ρ), and taking into account end-game effects, the total increase in potential functionis:

Total Increase in Φ ≤ Total Decrease in Φ+m · log(1+ǫ)(1/ρ). (6)

We are therefore left to bound the expected total decrease in the potential function. This canonly happen when a player among the ones that hold items departs. Each such player departswith probability p and hence the expected number of such players departing at each round isat most m · p (which is independent of the number of players and depends only on the numberof items). Whenever this happens, the potential function can decrease by at most log(1+ǫ)(1/ρ)since this is the maximum layer the corresponding item can be in. As a result, the expecteddecrease in the potential function is at most:

E[Total Decrease in Φ] ≤ p ·m · T · log(1+ǫ)(1/ρ) (7)

The lemma follows by combining (5), (6), and (7).

17

Page 19: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

4.2 Smoothness with discrete bidding spaces.

We now show that this mechanism with the discrete bidding space is smooth for the settings weconsider. This section focuses on auctions with unit-demand buyers which is a special case ofsubmodular player valuations. A valuation function v(S) for subset of items S is submodular iffor all sets A ⊂ B item j we have

v(A ∪ j) − v(A) ≥ v(B ∪ j) + v(B).

(1/2, 1)-smoothness of the simultaneous first price auction with submodular bidders and con-tinuous bids was known by Syrgkanis and Tardos [ST13]. A simple modification of the resultof [ST13] shows that if the discretization is fine enough, then the mechanism is approximately(1/2, 1) solution-based smooth. We provide the more general result for submodular valuations,as we re-use this fact in Section 6.1, where we consider more general valuations. Since thetechniques are similar to the ones used in [ST13], we defer this proof to Appendix B.

Lemma 4.2 (Smoothness of simultaneous first price auction with discrete bidding space). Thesimultaneous first price mechanism where players are restricted to bid on at most d items and oneach item submit a bid that is a multiple of ζ ·ρ, is a solution-based

(12 − ζ, 1

)-smooth mechanism,

when players have submodular valuations, such that all marginals are either 0 or at least ρ andsuch that each player wants at most d items, i.e. vi(S) = maxT⊆S:|T |=d v(T ).

4.3 Efficiency guarantee

We now provide the efficiency guarantee that has many nice properties. First, it is parametric;we get an extra factor of 2 due to the use of greedy algorithm in the proof but, other than that,the additional error goes to 0 as p → 0. Second, the turnover probability can be very highwithout big loss in efficiency as it does not have any dependence on the number of players oritems, depends only the range of item valuations. In particular, even if a constant fraction of theplayers is changing at every round then we still do not see much loss in efficiency, which makesthe price of anarchy guarantee meaningful even with high player turnover.

Theorem 4.3 (Main theorem for first-price auctions with unit-demand bidders ). In simulta-neous first-price auctions with dynamic population and unit-demand bidders, assume that theaverage optimal welfare in each round is at least mρ, that is 1

T

∑Tt=1 E[Opt(vt)] ≥ mρ (all items

can be allocated to someone for the minimum marginal value). If players use adaptive learningalgorithms such that Assumption 3.3 holds and T ≥ 1

p , it holds :

∑tE[W (st;vt)] ≥

(1−2ζ4 − 3

√p · 16 log(1/ρ) ln(NT )/ρ2

)∑tE[Opt(vt)]

where N is the number of different strategies considered by a player.

Proof. We apply our efficiency meta-theorem (Theorem 3.2) with xt being the allocation of thelayered-greedy algorithm with parameter ǫ > 0 to be defined later in the proof (note that wenever run this algorithm, it is just inside the proof and therefore it is fine to define the parameterex post). To do so, we use the (1/2 − ζ, 1) solution-based smoothness of the first price auctionwith discrete bidding space established in Lemma 4.2 and the stability of the solution sequenceproduced by the layered-greedy algorithm established in Lemma 4.1. Using that pT > 1, we

18

Page 20: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

therefore obtain:

t

E[W (st;vt)] ≥ (1− 2ζ)(1 − ǫ)

4

t

E[Opt(vt)]

−√

T ·m · (2m · (1 + pT )) log(1+ǫ)(1/ρ) +m) · ln(NT )

≥ (1− 2ζ)(1 − ǫ)

4

t

E[Opt(vt)]− 2mT ·√

p · log(1/ρ)

log(1 + ǫ)· ln(NT )

We now optimize ǫ to minimize the extra error added due to turnover and layering by equalizingthe contributions of regret term and error due to the layering. To facilitate computations, welower bound the optimum in the first term by mρT due to the assumption we made on the optimalsolution. Assuming that ǫ < 1 (which we will verify later), the denominator of the regret term canalso be lower bounded by ǫ/2 due to Taylor expansion. Setting ǫ = 3

√p · 128 log(1/ρ) ln(NT )/ρ2,

the two terms become equal. Using this, the result holds as the extra loss in the factor is atmost 2 · ǫ/4; note that if ǫ > 1 then the result is trivially true as the right hand side is a negativequantity while the left hand side is nonnegative by Property 3.3.

Remark 4.1. For the regret term involving the turnover probability p to become of the order of

η, we need p ≈ ρ2·η3log(NT )·log(1/ρ) which allows a constant churn of turnover at every round.

5 Stable Sequences via Differential Privacy

In this section we use the notion of differential privacy for constructing stable sequences needed byour main Theorems 3.1 and 3.2. Differential privacy offers a general framework to find solutionsthat are close to optimal, yet more stable to changes in the input than the optimum itself. Toguarantee privacy, the output of the algorithm is required to depend only minimally on anyplayer’s input. This is exactly what we need in our framework.

5.1 Background on differential privacy.

Differential privacy has been developed for databases storing private information for a population.A database D ∈ Vn is a vector of inputs, one for each user. Two databases are i-neighbors ifthey differ just in the i-th coordinate, i.e., only in the input of the i-th user. If two databases arei-neighbors for some i, they are called neighboring databases. Dwork et al. [DMNS06] define analgorithm as differentially private if one user’s information has little influence on the outcome.

In the setting of a game or mechanism the outcome for player i clearly should depend on playeri’s input (her claimed valuation, or source destination pair), so cannot be differentially private.The notion of joint differential privacy has been developed by Kearns et al. [KPRU14] to adaptdifferential privacy to such settings. We use X to denote the set of possible outcomes for oneplayer, so an algorithm in this context is a function A : Vn → X n. The algorithm is jointlydifferentially private, if for all players i, the output for all other players is differentially privatein the input of player i. More formally:

Definition 5.1 ([KPRU14]). An algorithm A : Vn → X n is (ǫ, δ)- jointly differentially private iffor every i, for every pair of i-neighbors D,D′ ∈ Vn, and for every subset of outputs S ⊆ X n−1.

Pr[A(D)−i ∈ S] ≤ exp(ǫ)Pr[A(D′)−i ∈ S] + δ (8)

If δ = 0, we say that A is ǫ-jointly differentially private.

19

Page 21: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Over the last years there have been a number of algorithms developed that solve problems close tooptimally in a differentially private way; see the book of Dwork and Roth [DR14] for a survey. Inthis paper, we take advantage of such algorithms, including the algorithms for solving matchingproblems [HHR+16] and finding socially optimal routing [RRUW15].

5.2 Stability via differential privacy

We now show that differentially private solutions imply stability. This is the main building blockfor efficiency guarantees.

Lemma 5.1 (Stable sequences via privacy). Suppose there exists an algorithm A : Vn → ∆(X n)that is (ǫ, δ)-jointly differentially private, takes as input a valuation profile v and outputs adistribution of solutions such that a sample from this distribution is feasible with probability1 − γ, and is β-approximately efficient in expectation (for 0 ≤ ǫ ≤ 1/2, β > 1, δ > 0, and 0 <γ < 1).8 Consider the sequence v1:T of valuations produced in a repeated cost-minimization gamewith dynamic population Γ = (G, p, T ) or mechanism with dynamic population M = (M,T, p).There exists a randomized sequence of solutions x

1:T for the sequence v1:T , such that for each

1 ≤ t ≤ T , xt conditional on vt is an β-approximation to Opt(vt) in expectation and the joint

randomized sequence (v1:T ,x1:T ) is k-stable (as in either Definition 3.1 or Definition 3.2) withk = pnT (1 + n(2ǫ+ 2γ + δ)).

To prove this main lemma, we require two auxiliary lemmas. First, we bound the total variationdistance between the outputs of a differentially private algorithm on two inputs that differ onlyon one coordinate (the type of one player). Total variation distance is a general measure for thedistance between distributions. For two distributions µ and η on some finite probability space Ωthe following are two equivalent versions of the total variation distance:

dtv(µ, η) =1

2‖µ − η‖1 = max

A⊂Ω(µ(A)− η(A)), (9)

where in the 1-norm in the middle we think of µ and η as a vector of probabilities over thepossible outcomes.

Lemma 5.2. Suppose that A : Vn → ∆(X n) is an (ǫ, δ)-joint differentially private algorithmwith failure probability γ (for 0 ≤ ǫ ≤ 1/2 , δ > 0, and 0 < γ < 1) that takes as input a valuationprofile v and outputs a distribution over feasible solutions σ. Let σ and σ′ be the algorithm’soutputs on two inputs v and v′ that differ only in coordinate i. Then we can bound the totalvariation distance between σ−i and σ′

−i by dtv(σ−i, σ′−i) ≤ (2ǫ+ δ).

Proof. Condition (8) of joint differential privacy guarantees that if we let S ⊆ X n−i be a subset

of possible solutions for players other than i, and σ−i(S) and σ′−i(S) be the probability that the

two distributions assign on S, then for any S: σ−i(S) ≤ eǫ ·σ′−i(S)+δ. Since ǫ ≤ 1/2, we can use

the bound eǫ ≤ 1+ 2ǫ to get that σ−i(S)− σ′−i(S) ≤ 2ǫσ′

−i(S) + δ ≤ 2ǫ+ δ. Thus by the seconddefinition of the total variation distance in Equation (9) we get that dtv(σ−i, σ

′−i) ≤ 2ǫ+ δ.

Second, we use a simple lemma from basic probability theory.

Lemma 5.3 (Coupling Lemma). Let µ and η be two probability measures over a finite set Ω.There is a coupling ω of (µ, η), such that if the random variable (X,Y ) is distributed accordingto ω, then the marginal distribution on X is µ, the marginal distribution on Y is η, and

P[X 6= Y ] = dtv(µ, η),

8γ is used here locally to match the notation used in differential privacy and not related to the γ of the efficiencyloss due to regret error in our overall parametric guarantee parametric guarantee.

20

Page 22: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Proof of Lemma 5.1. Suppose that A : Vn → ∆(X n) is an (ǫ, δ)-joint differentially private

algorithm as described in the definition of the lemma. The differentially private algorithm failswith probability γ. We denote with σ the output distribution over solutions for an input v, wherewe use the optimal solution in the low probability event that the algorithm fails. (EquivalentlyA could be a randomized algorithm and σ its implicit distribution over solutions).

Let σ1, . . . , σT , be the sequence of distributions output by A when run on a deterministic sequenceof valuation profiles v1, . . . , vT with the modification described in the paragraph above. Tosimplify the discussion we assume that only one player changes valuation at each time-step t.Essentially we break every transition from time-step t to t+ 1 into many sequential transitionswhere only one player changes at every time step, and then deleting the solutions from theresulting sequence that correspond to the added steps. Thus the number of steps within thisproof should be thought as being equal to n · p · T in expectation.

By Lemma 5.2, we know that the total variation distance of two consecutive distributions withoutthe modification of replacing failures with the optimal solution is at most 2ǫ+ δ. Since, by theunion bound, the probability that any of the two consecutive runs of the algorithm fail is atmost 2γ, we can show that the total variation distance of the latter modified output is at most2ǫ+ δ+2γ, i.e. for any t ∈ [T ]: dtv(σ

t+1−i , σt

−i) ≤ 2ǫ+ δ+2γ (see Lemma 5.4 for a formal proof).

We can turn the sequence of distributions σ1, . . . , σT into a distribution of sequences of allocationsx1:T by coupling the randomness used to select the solutions in different distributions σt. Todo this, we take advantage of the coupling lemma from probability theory (Lemma 5.3). If atstep t no player changes values, then σt = σt+1, and we select the same outcome from the twodistributions, so we get P

[xt−i 6= xt+1

−i

]= 0.

Now consider a step in which a player i changes her private type vi. We use Lemma 5.3 to couplext+1−i and xt−i so that9

P[xt+1−i 6= xt−i] = dtv(σ

t+1−i , σt

−i) ≤ 2ǫ+ δ + 2γ. (10)

Note that this couples the ith coordinate xt+1i and xti in an arbitrary manner, which is fine, as

we assumed that the valuation of player i changes at this step.

We have defined a probability distribution of sequences x1:T for every fixed sequence of valuationsv1:T . We extend this definition to random sequences of valuation in the natural way adding thedistribution of valuations v1:T .

We claim that the resulting random sequences of (valuation,solution) pairs satisfies the statementof the theorem: the β-approximation follows by the guarantees of the private algorithm and bythe fact that we use the optimal solution when the algorithm fails. Next we argue about thestability of the sequence. Consider a player i, and the distribution of her sequence (v1:Ti , x1:Ti ).In each step t her valuation vti changes with probability p, contributing pT in expectation to thenumber of changes. In a step t when some other value j 6= i changes, we use (10) to bound theprobability that xti 6= xt+1

i by 2ǫ+ δ + 2γ. Thus any change in the value of some other player jcontributes at most (2ǫ+ 2γ + δ) to the expectation of the number of changes for player i. Theexpected number of such changes in other values is (n− 1)pT over the sequence, showing that

E[Ki] = pT + (n− 1)pT (2ǫ+ 2γ + δ) ≤ pT (1 + n(2ǫ+ 2γ + δ)).

Summing over players, we obtain the lemma.9One can think of it as sampling xt+1 conditional on xt and assuming the joint distribution of xt and xt+1

is as prescribed by the coupling lemma applied to σt and σt+1. This is to address concerns that xt is alreadycoupled with xt−1 in the previous step.

21

Page 23: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Lemma 5.4. Let q and q′ be the output of an (ǫ, δ)-joint differentially private algorithm withfailure probability γ, on two valuation profiles v and v′ that differ only in coordinate i. Let σand σ′ be the modified output where the outcome is replaced with optimal outcome when thealgorithm fails. Then:

dtv(σ, σ′) ≤ 2ǫ+ δ + 2γ

Proof. Consider two random coupled random variables y, y′ that are implied by Lemma 5.3applied to distributions q and q′, such that y ∼ q and y′ ∼ q′ and P[y 6= y′] = dtv(q, q

′) ≤ 2ǫ+ δ(by Lemma 5.2). Now consider two other random variables x and x′ where x = y except for thecases where y is an outcome of a failure in which case x is equal to the welfare optimal outcomeand similarly for x′ and y′. Obviously: x ∼ σ and x′ ∼ σ′, thus (x, x′) is a valid coupling fordistributions σ and σ′. Thus if we show that Pr[x 6= x′] ≤ 2ǫ + δ + 2γ, then by properties oftotal variation distance dtv(σ, σ

′) ≤ Pr[x 6= x′] ≤ 2ǫ+ δ + 2γ, which is the property we want toshow.

Let fail be the event that either y or y′ is the outcome of a failed run of the algorithm. Then bythe union bound P[fail] ≤ 2γ. Thus we have:

P[x 6= x′

]= P

[x 6= x′ | ¬fail

]· P[¬fail] + P

[x 6= x′ | fail

]· P[fail]

≤ P[x 6= x′ | ¬fail

]· P[¬fail] + 2γ

= P[y 6= y′ | ¬fail

]· P[¬fail] + 2γ

≤ P[y 6= y′

]+ 2γ ≤ dtv(q, q

′) + 2γ ≤ 2ǫ+ δ + 2γ

This completes the proof of the Lemma.

5.3 Efficiency guarantee for dynamic games via differential privacy

We can now provide the resulting efficiency guarantee, which we instantiate in the next sectionto simultaneous auctions in large markets as well as routing games.

Theorem 5.5. Consider a repeated cost game with dynamic population Γ = (G,T, p), such thatthe stage game G is solution-based (λ, µ)-smooth and T ≥ 1

p . Assume that there exists an (ǫ, δ)-joint differentially private algorithm A : Vn → X n with approximation factor β that satisfies theconditions of Lemma 5.1. If all players use adaptive learning algorithms in the repeated gamethen the overall cost of the solution is at most:

∑tE[C(st;vt)] ≤ λβ

1−µ

∑t Opt(vt) + nT

1−µ

√p(1 + 2n(ǫ+ γ + δ)

)ln(NT )

Similarly for a (λ, µ)-smooth mechanism M = (M,T, p) we obtain:

t

E[W (st;vt)] ≥ λ

βmax1, µ∑

t

E[Opt(vt)]−mT

√(p+ p · n

m· (1 + 2n(ǫ+ γ + δ))

)ln(NT )

Proof. The proof follows directly by combining Lemma 5.1 with Theorems 3.1 and 3.2 respec-tively.

6 Applications of differential privacy in dynamic-population games

In this section, we combine Theorem 5.5 with differentially private algorithms to obtain enhancedefficiency guarantees for simultaneous-item auctions as well as guarantees for congestion games.The results of this section require the game to be large enough for the privacy results to apply.

22

Page 24: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

6.1 Improved efficiency for simultaneous-item auctions in large markets

We first apply the guarantee of the previous section to provide an efficiency result that improveson Theorem 4.3 in two ways: the guarantee is better by a factor of 2 and the result allowsthe players to have a broader class of valuations; in particular, they need to satisfy the grosssubstitute property (Definition 6.1). On the negative side, the result of this section is effectiveonly when the supply for each item is relatively large while Theorem 4.3 applies even when thesupply for each item is just one.

Definition 6.1 (Gross-substitute valuations). For a price p let p(A) =∑

j∈A pj denote thetotal price, and let ω(p) denote the player’s most desirable set of goods, that is, let ω(p) =argmaxA v(A)− p(A). The valuation satisfies the gross substitutes condition if for every pair ofprice vectors (p, p′) such that ∀ items j pj ≤ p′j and for every set of goods S ∈ ω(p) if S′ ⊆ Ssatisfies p′j = pj for every j ∈ S′ then there is a set S∗ ∈ ω(p′) with S′ ⊆ S∗.

In order to obtain a stable solution sequence, we apply the differentially private algorithm PAllocof Hsu et al. [HHR+16] and make the following assumptions required for its efficiency guarantee.

Assumption 6.2 (Large markets assumptions). Recall that m is the number of distinct itemsand B is the supply of each item. We posit the following large market assumptions:

1. The total number of items m = m · B is greater than the number of players n, i.e. m ≥ n.

2. Each player is interested in at most one copy of each item and is d-demand, i.e. gets 0additional value from obtaining more than d items.

3. The average optimal welfare in each round is at least mρ, that is 1T

∑Tt=1E[Opt(vt)] ≥ mρ

(value if all copies can be allocated to someone for the minimum marginal value ρ).

We now provide the privacy and approximation guarantee (Lemma 6.1) from [HHR+16] whichwe use in the our main efficiency guarantee (Theorem 6.2).

Lemma 6.1 (Theorems 20 and 21 in [HHR+16]). Let 0 < α < n/m and x be the allocationcomputed by PAlloc(α/3, α/3, ǫ). Then, if Assumption 6.2 holds and on top of that the supply

for each distinct item is at least: B ≥ 12E′+3α where E′ = 360

√2

α2ǫ

(log(90n

α2

)5/2log(4mγ ) + 1 then

the algorithm is (ǫ, 0) differentially private and, with probability 1− γ, the allocation satsfies

W (x) ≥ Opt(v)− αm.

Theorem 6.2 (Main theorem for large markets with gross-substitute d-demand bidders). Insimultaneous first-price auctions with dynamic population and d-demand bidders with gross-substitute valuations (Definition 6.1), if Assumption 6.2 holds, players use adaptive learningalgorithms such that Assumption 3.3 holds and T ≥ 1

p , it holds that, for any η > 0:

t

E[W (st;vt)] ≥ 1− 2ζ

2· (1− η)

(1− 1

ρ

√p · (3 + 2nǫ) ln(NT )

)∑

t

E[Opt(vt)].

where ǫ = 12ρ3

η3B·(360

√2 log

(90nρ2η2

)5/2log(4mn2)+4

)and N is the number of strategies considered

by a player, when the supply B is large enough, given all other parameters, to ensure that ǫ < 1.

Proof. By Lemma 4.2, the mechanism is (12 − ζ, 1)-smooth. Moreover, by Lemma 6.1 and bycondition 3 of Assumption 6.2, with probability 1 − γ, the allocation x of PAlloc(α/3, α/3, γ)has welfare satisfying: β ·W (x) ≥ Opt(v) for β = 1

1−α . We set γ = 12n . Then, since PAlloc is

(ǫ, 0)-joint differentially private, when the supply requirement in Lemma 6.1 is met, Theorem 5.5

23

Page 25: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

implies:

t

E[W (st;vt)] ≥ 1− 2ζ

2· (1− α

ρ)∑

t

E[Opt(vt)]−mT

√p+ p · n

m· (2 + 2nǫ) ln(NT ).

Setting ǫ = 12α3B

·(360

√2 log

(90nα2

)5/2log(4mn2)+4

)satisfies the supply requirement in Lemma 6.1.

Setting η = αρ and using conditions 1 and 3 of Assumption 6.2 concludes the final bound.

Remark 6.1. For the regret term involving the turnover probability p to become of the order of

1− η, we need p ≈ ρ5·η5·Bn log(NT )polylog(m,n,1/η,ρ) which still allows a near-constant churn of turnover

at every round when B = Θ(n).

Remark 6.2. Although we presented the result for first-price auctions, the same result extendsto second-price auctions under no-overbidding assumption. The second-price auction is not asmooth mechanism, it is only weakly (1, 0, 1)-smooth in the sense of [ST13], or is also (1, 1)-smooth in the sense of [Rou15] assuming that players do not bid above their value as shown byChristodoulou et al. [CKS16]. These alternate smoothness definitions also imply a factor of 1/2loss of social welfare at Nash. This can also be extended to solution-based smoothness, and thenused in the framework above to obtain the same bounds as Theorem 6.2 above. In the next part,we show how arrival uncertainty helps enhance the bounds for the natural extension of secondprice of item auction with many copies of each item.

6.2 Asymptotic optimality under participation uncertainty

We now show that, if there is some amount of participation uncertainty at every round then theefficiency guarantees can be significantly improved for some classes of auctions when the gameis sufficiently large, and result in asymptotic optimality when the turnover is also sufficientlysmall. In order to do so, we use the model of Feldman et al. [FIL+16] and extend their efficiencyguarantee to dynamic-population games. In this model, at every round, each player may failto participate with probability q (which can be thought as a small constant). Note that this isdifferent to the turnover probability p that determines whether the player switches their type;each player stays in the game in expectation with unchanged type for 1/p rounds and participatesin expectation in (1− q)/p rounds.

We denote by zt the random variables determining the players who participate: zti = 1 if playeri participates at round t and 0 otherwise. The welfare at round t with player strategies st iscorrespondingly denoted by W (st, zt;vt) (implicitly assuming that players i who didn’t showup may have had a strategy sti also), and the optimum by Opt(zt;vt). We note that bothare random variables; the optimum’s randomness comes from the dynamic population and therandomness of participation uncertainty zt in each round, while the welfare of the game outcomeW (st, zt;vt) also involves the randomness in the strategies of the players.

The auction format we consider is a large-supply extension of second-price auction, which we referas uniform-price auction. In particular, by the above large market assumptions (Assumption 6.2)there are m distinct items, each with a supple of B; therefore there are m = m · B total items.Each player i places a bid bij simultaneously for each distinct item j ∈ [m] and appears inthe round with probability 1 − q. Among the players that appear, each item j is awarded tothe players with the highest B bids and each of them pays a uniform bid equal to the bid ofthe (B + 1)-th bidder that appeared. Uniform-price auctions are natural extensions of first andsecond price auctions in the large market setting. In particular Theorem 6.2 and Remark 6.2extend to markets with item supply B, assuming the price charged is the Bth and (B + 1)thprize respectively. Participation uncertainty improves both efficiency bounds by a factor of 2.We illustrate this focusing on the version with (B + 1)th price.

24

Page 26: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Theorem 6.3 (Asymptotic optimality for large markets with participation uncertainty). Insimultaneous uniform-price auctions with dynamic population and d-demand bidders with gross-substitute valuations, if Assumption 6.2 holds, players fail to show up with probability q and useadaptive learning algorithms such that Assumption 3.3 holds and T ≥ 1

p , as p → 0 and B → ∞,the resulting social welfare is asymptotic optimal. Concretely, it holds that, for any η > 0:

t

E

[W (st, zt;vt)

]≥((1−ζ)(1−η)− 4

ρ√

q(1− q)B−1

ρ

√p · (3 + 2nǫ) ln(NT )

)∑

t

E

[Opt(zt;vt)

]

where ǫ = 12ρ3

η3B·(360

√2 log

(90nρ2η2

)5/2log(4mn2)+4

)and N is the number of strategies considered

by a player, when the supply B is large enough, given all other parameters, to ensure that ǫ < 1.

Proof sketch. We now provide the key ideas of the proof; the full proof is deferred to Appendix C.If there was no uncertainty in which player shows up, then we can directly obtain a smoothness-like inequality that has the B-th bid instead of the actual price, which is the (B + 1)-th bid(Lemma C.1); this is similar to (1, 1)-smoothness and therefore leads to a (1 − η) multiplicativeloss in efficiency. The uncertainty combined with the largeness of the game enables us to showthat the difference between the B-th and (B + 1)-th bid is in expectation small (Lemma C.2);this leads to the 4

ρ√

q(1−q)Bterm in the bound. For each subset of players who may show

up, we obtain a stable close-to-optimal solution to the sequence of subproblems involving thissubset using differential privacy (similar to Theorem 6.2) that leads to the last term in the bound.Finally, applying Bayesian extension [Rou12, Syr12], we combine the bounds for the subproblemswith the expected difference between B-th and (B + 1)-th bids leading to the final guarantee.

For the regret term involving the turnover probability to become order of η, we again needp ≈ ρ5·η5·B

n log(NT )polylog(m,n,1/η,ρ) . For the second to-last term to become of the order of η, we need

the supple to be at least B ≈ 1η2ρ2·q(1−q) . The above imply the claim that as p → 0 and B → ∞,

we have asymptotic optimality.

Remark 6.3. Although we presented the result for uniform-price auctions using the (B + 1)thprize, the version of uniform-prize used by [FIL+16], the same proof idea also extends to theversion of uniform-price auction using the Bth bid, resulting in a similar bound social welfare,showing asymptotic optimality, assuming T ≥ 1

p , as p → 0 and B → ∞.

6.3 Efficiency of congestion games with dynamic population

Our final application of privacy-induced stability is atomic congestion games, defined in Sec-tion 2.3. Rogers et al. [RRUW15] provide a a jointly differentially private algorithm for findingan optimal solution in congestion games, called Private gradient descent algorithm. They focuson routing games due to the paper’s focus on tolls as mediators, but their algorithm works infull generality for any atomic congestion game. The results in this paper rely on the followingassumptions for the latency in edge.

Assumption 6.3 (Latency assumptions). The latency functions ℓe(x) are non-decreasing, convexand twice differentiable. The latency on each edge is bounded by 1, that is, ℓe(x) ≤ 1. Thefunctions are L-Lipschitz, that is |ℓe(x)− ℓe(x

′)| ≤ L|x− x′| for some parameter 0 < L < 1.

Under these assumptions on the setting, Rogers et al. [RRUW15] show that their algorithm isapproximately optimal and differentially private.

Lemma 6.4 (Theorems 4.2 and 4.10 in [RRUW15]). When Assumption 6.3 holds, the algorithmof [RRUW15] is (ǫ, δ)-joint differentially private and, with probability at least 1− γ, outputs an

25

Page 27: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

integer solution x that satisfies

E[C(x;v)] ≤ Opt(v) +

√nm5/4

√ǫ

· polylog(1/ǫ, 1/δ, 1/γ, n,m).

We start by illustrating our technique with linear latencies ℓe(x) = aex+ be. The linear-latencycase admits (5/3, 1/3) solution-based smoothness:

Lemma 6.5 (due to [CK05, AAE13, Rou09]). Congestion games with linear latencies ℓe(x) =aex+ be for ae, be ≥ 0 are (5/3, 1/3) solution-based smooth with respect to any solution x.

Combining the above smoothness result (Lemma 6.5) and privacy (Lemma 6.4) results with thegeneral efficiency guarantee (Theorem 5.5) leads to the following guarantee.

Theorem 6.6 (Main theorem for large congestion games with linear latencies). In linear-latencycongestion games with dynamic population, if players use adaptive learning algorithms andT ≥ 1

p , it holds that, for any η > 0:

t

E[C(st;vt)] ≤ 5

2·(1 + η +

m2 maxe(ae + be)

mine ae·√

p(1 + 6nǫ) ln(NT ))∑

t

Opt(vt)

where ǫ = 1η2

· m9/2·polylog(n,m)n ·

(maxe(ae+be)

mine ae)

)2and and N is the number of strategies considered

by a player, when n is large enough, given all other parameters, to ensure that ǫ < 1.

Proof. We assume ae > 0 for all e ∈ E and that be ≥ 0. We also scale down all latencies bynmmaxe(ae + be) which makes the loss incurred by each player at most 1 (needed for the onlinelearning algorithms); this also helps satisfy Assumption 6.3, as it makes the latencies ℓe(x) ≤ 1and L-Lipschitz for L = 1/n < 1. As a result, all the additive bounds should be renormalizedby nmmaxe(ae + be). From Theorem 5.5, it therefore holds that:∑

tE[C(st;vt)] ≤ λβ1−µ

∑t Opt(vt) + nT

1−µ

√p(1 + 2n(ǫ+ γ + δ)

)ln(NT ) · nmmaxe(ae + be)

where β is the approximation factor of the joint-differentially private solution. Since in theoptimal solution we need to serve all players (and each uses at least one edge), the optimalsolution is at least Opt(vt) ≥ n2

m mine ae. As a result, to transform the bound in Lemma 6.4 toa multiplicative bound of β = 1 + η, it should hold that:

ηOpt(vt) ≥ ηn2

mmine

ae ≥√nm5/4

√ǫ

· polylog(1/ǫ, 1/δ, 1/γ, n,m) ·mnmaxe

(ae + be).

This is guaranteed by setting ǫ = δ = γ = 1η2

· m9/2·polylog(n,m)n ·

(maxe(ae+be)

mine ae)

)2. The proof

concludes by bounding the regret term via a similar argument by∑

t Opt(vt)

1− µ

√p(1 + 2n(ǫ+ δ + γ)) ln(NT ) · m

2 maxe(ae + be)

mine ae.

Remark 6.4. For the regret term involving the turnover probability p to become of order η, weneed

p ≈ η4

m10 · log(NT )polylog(m,n)·( mine aemaxe(ae + be)

)3.

Note that this only depends on the number of congestible elements (edges) and is independent (upto logarithms) to the number of players, allowing for a near-constant churn of turnover.

Remark 6.5. In the above presentation we used linear latencies but we note that more generallatencies would only affect the smoothness factor; as a result, similar techniques can extend theresults for polynomial latencies due to [Rou03, Olv06, ADG+11] to dynamic population.

26

Page 28: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Acknowledgements

We would like to thank Karthik Sridharan for pointing us to relevant adaptive regret learningliterature and Janardhan Kulkarni for suggesting the direction of asymptotic optimality underparticipation uncertainty. Finally, part of the work was conducted when the authors were visitingthe Simons Institute.

References

[AAE13] Baruch Awerbuch, Yossi Azar, and Amir Epstein. The price of routing unsplittableflow. SIAM J. Comput., 42(1):160–177, 2013. Preliminary version appeared inSTOC’05.

[ADG+11] S. Aland, D. Dumrauf, M. Gairing, B. Monien, and F. Schoppmann. Exact price ofanarchy for polynomial congestion games. SIAM Journal on Computing, 40(5):1211–1233, 2011.

[AJ13] Sachin Adlakha and Ramesh Johari. Mean field equilibrium in dynamic games withstrategic complementarities. Operations Research, 61(4):971–989, 2013.

[AJW15] Sachin Adlakha, Ramesh Johari, and Gabriel Y. Weintraub. Equilibria of dynamicgames with many players: Existence, approximation, and market structure. Journalof Economic Theory, 156:269–316, 2015.

[BBW15] Santiago R. Balseiro, Omar Besbes, and Gabriel Y. Weintraub. Repeated auctionswith budgets in ad exchanges: Approximations and design. Management Science,61(4):864–884, 2015.

[BBW19] Santiago R. Balseiro, Omar Besbes, and Gabriel Y. Weintraub. Dynamic mecha-nism design with budget constrained buyers under limited commitment. OperationsResearch, 2019.

[BEDL06] Avrim Blum, Eyal Even-Dar, and Katrina Ligett. Routing without regret: Onconvergence to nash equilibria of regret-minimizing algorithms in routing games. InProceedings of the Twenty-fifth Annual ACM Symposium on Principles of DistributedComputing, PODC ’06, pages 45–52, New York, NY, USA, 2006. ACM.

[BG19] Santiago R. Balseiro and Yonatan Gur. Learning in repeated auctions with budgets:Regret minimization and equilibrium. Management Science, 65(9), 2019.

[BHLR08] Avrim Blum, MohammadTaghi Hajiaghayi, Katrina Ligett, and Aaron Roth. Regretminimization and the price of total anarchy. In Proceedings of the Fortieth AnnualACM Symposium on Theory of Computing, STOC ’08, pages 373–382, New York,NY, USA, 2008. ACM.

[Blu97] Avrim Blum. Empirical support for winnow and weighted-majority algorithms: Re-sults on a calendar scheduling domain. Machine Learning, 26(1):5–23, 1997.

[BM07] Avrim Blum and Yishay Mansour. From external to internal regret. J. Mach. Learn.Res., 8:1307–1324, December 2007.

[BMSW18] Mark Braverman, Jieming Mao, Jon Schneider, and Matt Weinberg. Selling to ano-regret buyer. In Proceedings of the 2018 ACM Conference on Economics andComputation, EC ’18, page 523–538, New York, NY, USA, 2018. Association forComputing Machinery.

27

Page 29: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

[BO98] T. Başar and G. Olsder. Dynamic Noncooperative Game Theory, 2nd Edition. So-ciety for Industrial and Applied Mathematics, 1998.

[Bro51] G. W. Brown. Iterative solutions of games by fictitious play. In Activity Analysisof Production and Allocation, pages 374–376. Wiley, 1951.

[BS10] Dirk Bergemann and Maher Said. Dynamic Auctions: A Survey. Technical Report1757R, Cowles Foundation for Research in Economics, Yale University, 2010.

[CBGLS12] Nicolò Cesa-Bianchi, Pierre Gaillard, Gabor Lugosi, and Gilles Stoltz. Mirror De-scent Meets Fixed Share (and feels no regret). In NIPS 2012, volume 25, page Paper471, Lake Tahoe, United States, December 2012.

[CBL06] Nicolo Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cam-bridge University Press, New York, NY, USA, 2006.

[CK05] George Christodoulou and Elias Koutsoupias. The price of anarchy of finite con-gestion games. In Proceedings of the Thirty-seventh Annual ACM Symposium onTheory of Computing, STOC ’05, pages 67–73, New York, NY, USA, 2005. ACM.

[CKK+15] Ioannis Caragiannis, Christos Kaklamanis, Panagiotis Kanellopoulos, Maria Ky-ropoulou, Brendan Lucier, Renato Paes Leme, and Éva Tardos. Bounding theinefficiency of outcomes in generalized second price auctions. Journal of EconomicTheory, 156:343 – 388, 2015. Computer Science and Economic Theory.

[CKS16] George Christodoulou, Annamária Kovács, and Michael Schapira. Bayesian combi-natorial auctions. J. ACM, 63(2):11:1–11:19, April 2016.

[CP14] Yang Cai and Christos Papadimitriou. Simultaneous bayesian auctions and compu-tational complexity. In Proceedings of the Fifteenth ACM Conference on Economicsand Computation, EC ’14, page 895–910, New York, NY, USA, 2014. Associationfor Computing Machinery.

[CPS10] Ruggiero Cavallo, David C. Parkes, and Satinder Singh. Efficient mechanisms withdynamic populations and dynamic types. Types. Technical report, Harvard Univer-sity, 2010.

[CSSM03] Jose R. Correa, A. S. Schulz, and N. E. Stier-Moses. Selfish routing in capacitatednetworks. MATHEMATICS OF OPERATIONS RESEARCH, 29:2004, 2003.

[DGSS15] Amit Daniely, Alon Gonen, and Shai Shalev-Shwartz. Strongly adaptive online learn-ing. In Francis Bach and David Blei, editors, Proceedings of the 32nd InternationalConference on Machine Learning (COLT), volume 37 of Proceedings of MachineLearning Research, pages 1405–1411, Lille, France, 07–09 Jul 2015. PMLR.

[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noiseto sensitivity in private data analysis. In Proceedings of the Third Conference onTheory of Cryptography, TCC’06, pages 265–284, Berlin, Heidelberg, 2006. Springer-Verlag.

[Dol78] Robert J Dolan. Incentive mechanisms for priority queuing problem. Bell Journalof Economics, 9:421–436, 1978.

[DR14] C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. Foun-dations and Trends in Theoretical Computer Science. Now Publishers Incorporated,2014.

28

Page 30: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

[DS16] Constantinos Daskalakis and Vasilis Syrgkanis. Learning in auctions: Regret is hard,envy is easy. In 2016 IEEE 57th Annual Symposium on Foundations of ComputerScience (FOCS), pages 219–228, 2016.

[DSS19] Yuan Deng, Jon Schneider, and Balasubramanian Sivan. Strategizing against no-regret learners. In Advances in Neural Information Processing Systems 32. 2019.

[FIL+16] Michal Feldman, Nicole Immorlica, Brendan Lucier, Tim Roughgarden, and VasilisSyrgkanis. The price of anarchy in large games. In Proceedings of the Forty-eightsAnnual ACM Symposium on Theory of Computing, STOC ’16, 2016.

[FL98] Drew Fudenberg and David K Levine. The Theory of Learning in Games. MITPress, Cambridge, MA, USA, 1998.

[FLL+16] Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, and Éva Tardos.Learning in games: Robustness of fast convergence. In NIPS, 2016.

[FP14] Drew Fudenberg and Alexander Peysakhovich. Recency, records and recaps: Learn-ing and non-equilibrium behavior in a simple decision problem. In Proceedings of theFifteenth ACM Conference on Economics and Computation, EC ’14, pages 971–986,New York, NY, USA, 2014. ACM.

[FS97] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139,August 1997.

[FSSW97] Yoav Freund, Robert E Schapire, Yoram Singer, and Manfred K Warmuth. Usingand combining predictors that specialize. In Proceedings of the twenty-ninth annualACM symposium on Theory of computing (STOC), pages 334–343. ACM, 1997.

[FT91] Drew Fudenberg and Jean Tirole. Perfect bayesian equilibrium and sequential equi-librium. Journal of Economic Theory, 53(2):236 – 260, 1991.

[Han57] James Hannan. Approximation to Bayes risk in repeated plays. Contributions tothe Theory of Games, 3:97–139, 1957.

[HHR+16] Justin Hsu, Zhiyi Huang, Aaron Roth, Tim Roughgarden, and Zhiwei Steven Wu.Private matchings and allocations. SIAM Journal on Computing, 45(6):1953–1984,2016.

[HMC00] Sergiu Hart and Andreu Mas-Colell. A Simple Adaptive Procedure Leading toCorrelated Equilibrium. Econometrica, 68(5):1127–1150, September 2000.

[HMC12] Sergiu Hart and Andreu Mas-Colell. Simple Adaptive Strategies: From Regret-matching to Uncoupled Dynamics. World Scientific Publishing Co., Inc., River Edge,NJ, USA, 2012.

[HS07] Elad Hazan and C. Seshadhri. Adaptive algorithms for online decision problems.Electronic Colloquium on Computational Complexity (ECCC), 14(088), 2007.

[HW98] Mark Herbster and Manfred K. Warmuth. Tracking the best expert. Mach. Learn.,32(2):151–178, August 1998.

[IJS14] Krishnamurthy Iyer, Ramesh Johari, and Mukund Sundararajan. Mean field equi-libria of dynamic auctions with learning. Management Science, 60(12):2949–2970,2014.

[JT04] Ramesh Johari and John N. Tsitsiklis. Efficiency loss in a network resource allocationgame. Mathematics of Operations Research, 29(3):407–435, 2004.

29

Page 31: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

[KP99] Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. In Proceed-ings of the 16th Annual Conference on Theoretical Aspects of Computer Science,STACS’99, pages 404–413, Berlin, Heidelberg, 1999. Springer-Verlag.

[KPRU14] Michael Kearns, Mallesh M. Pai, Aaron Roth, and Jonathan Ullman. Mechanismdesign in large games: incentives and privacy. In Innovations in Theoretical Com-puter Science, ITCS’14, Princeton, NJ, USA, January 12-14, 2014, pages 403–410,2014.

[KS15] Ehud Kalai and Eran Shmaya. Learning and stability in big uncertain games. Tech-nical report, 2015.

[Leh03] Ehud Lehrer. A wide range no-regret theorem. Games and Economic Behavior,42:101–115, 2003.

[LLN01] Benny Lehmann, Daniel Lehmann, and Noam Nisan. Combinatorial auctions withdecreasing marginal utilities. In Proceedings of the 3rd ACM Conference on Elec-tronic Commerce, EC ’01, pages 18–28, New York, NY, USA, 2001. ACM.

[LS15] Haipeng Luo and Robert E. Schapire. Achieving all with no parameters: Adaptivenormalhedge. COLT, 2015.

[MPLTZ18] Vahab Mirrokni, Renato Paes Leme, Pingzhong Tang, and Song Zuo. Non-clairvoyant dynamic mechanism design. In Proceedings of the 2018 ACM Conferenceon Economics and Computation, EC ’18, page 169, New York, NY, USA, 2018. As-sociation for Computing Machinery.

[MT01] Eric Maskin and J. Tirole. Markov perfect equilibrium, i: Observable actions. Jour-nal of Economic Theory, 100:191–219, 2001.

[NN16] Noam Nisan and Gali Noti. An experimental evaluation of regret-based econometrics.CoRR, abs/1605.03838, 2016.

[NST15] Denis Nekipelov, Vasilis Syrgkanis, and Eva Tardos. Econometrics for learningagents. In Proceedings of the Sixteenth ACM Conference on Economics and Com-putation, EC ’15, pages 1–18, New York, NY, USA, 2015. ACM.

[Olv06] N. Olver. The price of anarchy and a priority-based model of routing. Master’sthesis, McGill University, 2006.

[PLT10] Renato Paes Leme and Éva Tardos. Pure and bayes-nash price of anarchy forgeneralized second price auction. 2010.

[PS03] David C. Parkes and Satinder P. Singh. An MDP-Based Approach to Online Mech-anism Design. In NIPS, 2003.

[Rob51] Julia Robinson. An iterative method of solving a game. Annals of Mathematics,pages 10–2307, 1951.

[Rou03] Tim Roughgarden. The price of anarchy is independent of the network topology.Journal of Computer and System Sciences, 67(2):341–364, 2003.

[Rou09] Tim Roughgarden. Intrinsic robustness of the price of anarchy. In Proceedings ofthe Forty-first Annual ACM Symposium on Theory of Computing, STOC ’09, pages513–522, New York, NY, USA, 2009. ACM.

[Rou12] Tim Roughgarden. The price of anarchy in games of incomplete information. InProceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pages862–879, New York, NY, USA, 2012. ACM.

30

Page 32: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

[Rou15] Tim Roughgarden. Intrinsic robustness of the price of anarchy. Journal of the ACM,62, 2015.

[RRUW15] Ryan M. Rogers, Aaron Roth, Jonathan Ullman, and Zhiwei Steven Wu. Inducingapproximately optimal flow using truthful mediators. In Proceedings of the SixteenthACM Conference on Economics and Computation, EC’15. ACM, 2015.

[RT02] Tim Roughgarden and Éva Tardos. How bad is selfish routing? J. ACM, 49(2):236–259, March 2002.

[ST13] Vasilis Syrgkanis and Éva Tardos. Composable and efficient mechanisms. In Pro-ceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, STOC’13, pages 211–220, New York, NY, USA, 2013. ACM.

[Syr12] Vasilis Syrgkanis. Bayesian games and the smoothness framework. CoRR,abs/1203.5155, 2012.

[VL10] Ngo Van Long. A survey of dynamic games in economics, volume 1. World Scientific,2010.

[WBVR06] Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy. Oblivious equi-librium: A mean field approximation for large-scale dynamic games. In Y. Weiss,B. Schölkopf, and J.C. Platt, editors, Advances in Neural Information ProcessingSystems 18, pages 1489–1496. MIT Press, 2006.

[WBVR08] Gabriel Y. Weintraub, C. Lanier Benkard, and Benjamin Van Roy. Markov perfectindustry dynamics with many firms. Econometrica, 76(6):1375–1411, 2008.

[WBVR10] Gabriel Y Weintraub, C Lanier Benkard, and Benjamin Van Roy. Computationalmethods for oblivious equilibrium. Operations research, 58(4-part-2):1247–1265,2010.

A Classical static-population smoothness proofs (Section 2)

Proof of Proposition 2.1. Let st be the strategy vector used by the players at round t. Sinceeach player i has low regret subject to the strategy s⋆i (xi, vi), it holds that

∑t ci(s

t; vi) ≤∑t ci(s

⋆i (vi, xi), s

t−i; vi) +Ri(1, T ). Adding up these inequalities and using the smoothness prop-

erty, we obtain:

1

T

t

C(st;v) =1

T

t

i∈[n]ci(s

t; vi) ≤1

T

t

i

ci(s⋆i (vi, xi), s

t−i; vi) +

1

T

i

Ri(1, T )

≤ λC(x;v) + µ1

T

t

C(st;v) +1

T

i

Ri(1, T )

The claimed bound follows by rearranging the terms and since 1T

∑i Ri(1, T ) ≤ n/

√T .

Proof of Proposition 2.2. Let st be the strategy vector used by players at round t. Since eachplayer i has low regret subject to strategy s⋆i (vi, xi), it holds that

∑t ui(s

t; vi) ≥∑

t ui(s⋆i (vi, xi), s

t−i; vi)−

Ri(1, T ). Adding up these inequalities and using the smoothness property, we obtain:

31

Page 33: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

1

T

t

i∈[n]ui(s

t; vi) ≥1

T

t

i

ui(s⋆i (vi, xi), s

t−i; vi)−

1

T

i

Ri(1, T )

≥ W (x;v) − µ1

T

t

R(st)− 1

T

i

Ri(1, T )

The claimed bound follows by rearranging the terms and since 1T

∑i Ri(1, T ) ≤ n/

√T .

B Smoothness of first-price auctions with discrete bid spaces

In this section, we show the smoothness of first-price auction with discrete bid spaces (Lemma 4.2);the proof is similar to the one of Syrgkanis and Tardos [ST13] but needs to be adapted to workwith discrete bids and is provided therefore for completeness.

Proof of Lemma 4.2. Consider a valuation profile v = (v1, . . . , vn) for the n players and a bidprofile b = (b1, . . . , bn). Each valuation vi is submodular and thereby also falls into the class ofXOS valuations [LLN01], i.e. it can be expressed as a maximum over additive valuations. Moreformally, for some index set Li:

vi(S) = maxℓ∈Li

j∈Saℓij

Furthermore, when the player has value for at most d types of items, it can also be shown thatfor any ℓ ∈ Li at most d of the (aℓij)j∈[m] will be non-zero.

Consider a feasible allocation x = (x1, . . . , xn) of the items to the bidders, where xi is the setof types of items allocated to player i (the latter is feasible if each item is never allocated morethan its supply). We assume without loss of generality that each allocation xi is minimal inthe sense that deleting any item from xi decreases the value for player i. Let mi denote thenumber of items allocated to player i. Observe that we must have vi(xi) ≥ ρmi, as the functionis submodular, and by minimality, adding items one-by-one, each item must increase the value.

Consider the following deviation b∗i (vi, xi) that is related to the valuation vi of player i and to

allocation xi: Let ℓ∗(xi) = argmaxℓ∈Li

∑j∈xi

aℓij . Then on each item j ∈ xi with aℓ∗(xi)ij > 0,

submit

⌊aℓ∗(xi)ij

2

ζ·ρ, where we denote with ⌊ξ⌋ζ·ρ the highest multiple of ζ · ρ that is less than

or equal to ξ. On each j /∈ xi, submit a zero bid; this submits at most d non-zero bids. Thisdeviation implies the solution-based smooth property. Let pj(b) be the lowest winning bid on

item j, under bid profile b. For each j, if pj(b) <

⌊aℓ∗(xi)ij

2

ζ·ρ, i wins item j and pays

⌊aℓ∗(xi)ij

2

ζ·ρ.

32

Page 34: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Thus we obtain:

ui(b∗i (vi, xi), b−i; vi) ≥

j∈xi

a

ℓ∗(xi)ij −

⌊aℓ∗(xi)ij

2

ζ·ρ

· 1

pj(b) <

⌊aℓ∗(xi)ij

2

ζ·ρ

≥∑

j∈xi

⌊aℓ∗(xi)ij

2

ζ·ρ· 1

pj(b) <

⌊aℓ∗(xi)ij

2

ζ·ρ

≥∑

j∈xi

⌊aℓ∗(xi)ij

2

ζ·ρ− pj(b)

≥∑

j∈xi

(aℓ∗(xi)ij

2− ζ · ρ− pj(b)

)

=

1

2

j∈xi

aℓ∗(xi)ij − ζ · ρ ·mi

j∈xi

pj(b)

≥(1

2− ζ

)vi(xi)−

j∈xi

pj(b)

Summing over all players and observing that R(b) ≥∑j∈xipj(b), we get the theorem.

C Asymptotic optimality with uncertain participation

In this section, we provide the proof of the asymptotically optimal efficiency guarantee at thepresence of participation uncertainty (Theorem 6.3).

For a participation vector z, such that zi = 1 if player i shows up and 0 otherwise, let x(z)denote a feasible solution to the problem with participation z and SW (x(z),v) be the socialwelfare of this solution with participant values v. We first provide an inequality very similar to(1, 1)-solution-based smoothness that holds for any such subset of players that end up showingup. In what follows, we use bj(k, z) to denote the k-th highest bid on item j among the playerswith zi = 1, and ui(b, z−i, vi) the utility of player i with value vi, bids b, where z−i indicates theother participants.

Lemma C.1. For any valuations v and any participation vector z there is a bid-vector b⋆i (x(z))for all bidders i depending on the solution x(z) such that for any bid vector b and any participationvector z the following holds:

i

ziui(b⋆i (x(z))), b−i, z−i, vi) ≥ SW (x(z),v) −

j

nj(x(z))bj(B, z)

where nj(x(z)) is the number of copies of item j allocated by solution x(z).

If the proposed b⋆i needs to be restricted to only use multiples of ζ · ρ then we have

i

ziui(b⋆i (x(z)), b−i, z−i, vi) ≥ (1− ζ)SW (x(z),v) −

j

nj(x(z)bj(B, z)

Proof. Recall from the proof of Lemma 4.2 (Appendix B) that, since valuations are gross-substitute, they can be expressed as a maximum over additive valuations. More formally, for

33

Page 35: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

each player i and some index set Li:

vi(S) = maxℓ∈Li

j∈Saℓij

Consider a feasible allocation x(z) = (x1, . . . , xn) of items to bidders, with each xi is minimal,as in the proof of Lemma 4.2, where we temporarily dropped the z in the notation for xi in thisproof. Let ℓ be the index so vi(xi) =

∑j∈xi

aℓij, and we define b⋆ij(x(z)) = aℓij for all items j ∈ xiand 0 otherwise. For a bidder i, assume that with bid b⋆i (x(z)) the bidder wins a set of items x′iamong the items in xi. Notice that for all participants with zi = 1 we have:

ui(b⋆i (x(z)), b−i, z−i, vi) ≥

j∈x′i

(aℓij − bj(B, z))

≥∑

i∈xi

(aℓij − bj(B, z)) = vi(xi)−∑

j∈xi

bj(B, z)

where the first inequality follows as vi(x′i) ≥ ∑

j∈x′iaℓij , winning extra times beyond xi at 0

prize can only improve the value, and with the extra bid b⋆i (x(z)), the price can be as high asbj(B, z). The second inequality follows as for any item j ∈ xi \ x′i we must have that the bidb⋆ij(x(z)) ≤ bj(B, z).

Now add the utility bounds over all players to get the claimed bound:

i

ziui(b⋆i (x(z)), b−i, z−i, vi) ≥

i

zi(vi(xi)−∑

j∈xi

bj(B, z)) = SW (x(z),v) −∑

j

nj(x(z))bj(B, z)

If the bidders are restricted to only use multiples of ζ · ρ then we use b⋆ij(x(z)) = ζ · ρ⌊aℓij

ζ·ρ⌋, andget for a player i participating

ui(b⋆i (x(z)), b−i, z−i, vi) ≥ vi(xi)−

j∈xi

(bj(B, z)− ζρ)

summing over all players gives us∑

i

ziui(b⋆i (x(z)), b−i, z−i, vi) ≥ SW (x(z),v) −

j

nj(x(z))(bj(B, z)− ζρ)

≥ (1− ζ)SW (x(z),v) −∑

j

nj(x)bj(B, z)

where the last inequality follows by the assumption that each item allocated to a player hasmarginal value at least ρ.

Note that (1 − ζ, 1)-smoothness of the mechanism would be implied if we had bj(B + 1, z) inplace of bj(B, z) in the above expression as bj(B + 1, z) for z = z is the uniform price chargedfor each of the B copies of the item in the auction with bids b and participation z. The maininsight in using the above smoothness-like claim is that, due to the uncertainty, the bids bj(B, z)and bj(B + 1, z) have almost the same distribution as was showed by Lemma 16 in [FIL+16].

Lemma C.2 (Lemma 16 in [FIL+16]). The expected difference between the B-th and (B+1)-stbids is at most

Ez(bj(B, z)) −Ez(bj(B + 1, z)) ≤ 4√q(1− q)B

.

34

Page 36: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Proof of Theorem 6.3. Let xt(z) be the allocation returned by PAlloc and Optt(z) be the opti-

mum for the auction at time t with participants z.

At a high level the proof combines Lemma C.1 used for the solutions xt(z) and Lemma C.2 withthe Bayesian extension of [Rou12, Syr12] that was also used by [FIL+16]. Fix the set of playersthat appear at any particular round t and call it zt. For any alternate participation vector z,since player i uses adaptive learning algorithms and the allocation of PAlloc is (ǫ, 0)-differentiallyprivate, applying Lemma 5.1 (as in Theorem 5.5), it holds:

E(∑

t

ztiui(bt, zt−i, v

ti)) ≥ E(

t

ztiui(b⋆i (x

t(1, z−i)), bt−i, z

t−i), v

ti)−Regi. (11)

where Regi is the expected regret of player i’s learning algorithm, and the expectation is overthe randomization by the learning algorithm.

Next we take expectation over zt as well as over z on both sides of the inequality. One term onthe right hand side will be

Ezt,z(ztiui(b

⋆i (x

t(1, z−i)), bt−i, z

t−i, v

ti)) = Ezt,z(ziui(b

⋆i (x

t(z)), bt−i, zt−i, v

ti)) (12)

where the equation follows noting that the expression on the left doesn’t depend on zi, so we canrename zti as zi as the two have the same distribution, and replacing 1 by zi has no effect due tothe multiplier zi.

By Lemma C.1, for any fixed participation vector z, for all deviating bids b⋆i (xt(z)) with valua-

tions vt:∑

i

ziui(b⋆i (x

t(z)), bt−i, zt−i, v

ti) ≥ (1− ζ)SW (xt(z),vt)−

j

ntj(x

t(z))btj(B, zt)

≥ (1− ζ)SW (xt(z),vt)−∑

j

B · btj(B, zt) (13)

Next we sum equation (11) for all players, take expectation over zt and z. Finally, using Eqs (12)and (13) for each z, and combining with Lemma 6.1, we obtain:

t

i

E[ztiui(bt, zt−i, v

ti)]+

t

j

E[B · btj(B, zt)

]

≥∑

t

(1− ζ)E(Opt(z;vt))− (1− ζ)αmT −∑

i

Regi (14)

where the expectation on the left hand side is over the bidding strategies of the players as wellas the randomness due to dynamic population and the participation zt, while expectation on theright hand side is over z and the randomness due to dynamic population.

Combining Theorem 3.2 and Lemma 5.1 (as in Theorem 5.5), the total expected regret is atmost

i

Regi ≤ mT

√(p+ p · n

m· (1 + 2n(ǫ+ γ))

)ln(NT ) (15)

Now letting E′ = 360√2

α2ǫ

(log(90n

α2

)5/2log(4mγ ) + 1 as in Lemma 5.1, when B ≥ 12E′+3

α , it holds:

t

i

E(ztiui(bt, zt−i, v

ti)) +

t

j

E[B · btj(B, zt)

]≥∑

t

(1− ζ)E(Opt(z;vt))− (1− ζ)αmT

−mT

√(p+ p · n

m· (1 + 2n(ǫ+ γ))

)ln(NT )

(16)

35

Page 37: Learning and Efficiency in Games with Dynamic Population · Game theory classically considers Nash equilibria of one-shot games, while in practice many games are played repeatedly,

Using Lemma C.2 for the participation vectors zt, the LHS of Eq. (16) can be upper bounded bythe expected welfare

∑tE(W (st, zt;vt)) with a small error term, where the expectation is over

the dynamic population, participation, as well as randomness in learning algorithms as follows:

t

E

[W (st, zt;vt)

]+

4mT√q(1− q)B

≥ E

[∑

t

i

ztiui(bt, zt−i, v

ti) +

t

j

B · btj(B, zt)]. (17)

Similar to the proof of Theorem 6.2, Assumption 6.2 combined with Eqs (16) and (17) imply:

t

E

[W (st, zt;vt)

]≥((1−ζ)(1−α

ρ)− 4

ρ√

q(1− q)B−1

ρ

√p+ p · n

m· (2 + 2nǫ) ln(NT )

)E

[Opt(zt;vt)

]

with ǫ = 12α3B

·(360

√2 log

(90nα2

)5/2log(4mn2) + 4

), concluding the proof.

36