Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place...

17
MANAGEMENT SCIENCE Vol. 65, No. 9, September 2019, pp. 39523968 http://pubsonline.informs.org/journal/mnsc ISSN 0025-1909 (print), ISSN 1526-5501 (online) Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium Santiago R. Balseiro, a Yonatan Gur b a Graduate School of Business, Columbia University, New York, New York 10027; b Graduate School of Business, Stanford University, Stanford, California 94305 Contact: [email protected], http://orcid.org/0000-0002-0012-3292 (SRB); [email protected], http://orcid.org/0000-0003-0764-3570 (YG) Received: February 21, 2017 Revised: January 9, 2018; July 5, 2018 Accepted: July 10, 2018 Published Online in Articles in Advance: August 1, 2019 https://doi.org/10.1287/mnsc.2018.3174 Copyright: © 2019 INFORMS Abstract. In online advertising markets, advertisers often purchase ad placements through bidding in repeated auctions based on realized viewer information. We study how budget- constrained advertisers may compete in such sequential auctions in the presence of un- certainty about future bidding opportunities and competition. We formulate this problem as a sequential game of incomplete information, in which bidders know neither their own valuation distribution nor the budgets and valuation distributions of their competitors. We introduce a family of practical bidding strategies we refer to as adaptive pacing strategies, in which advertisers adjust their bids according to the sample path of expenditures they exhibit, and analyze the performance of these strategies in different competitive settings. We es- tablish the asymptotic optimality of these strategies when competitorsbids are independent and identically distributed over auctions, but also when competing bids are arbitrary. When all the bidders adopt these strategies, we establish the convergence of the induced dynamics and characterize a regime (well motivated in the context of online advertising markets) under which these strategies constitute an approximate Nash equilibrium in dynamic strategies: the benet from unilaterally deviating to other strategies, including ones with access to complete information, becomes negligible as the number of auctions and competitors grows large. This establishes a connection between regret minimization and market stability, by which ad- vertisers can essentially follow approximate equilibrium bidding strategies that also ensure the best performance that can be guaranteed off equilibrium. History: Accepted by Noah Gans, stochastic models and simulation. Supplemental Material: Supplemental material is available at https://doi.org/10.1287/mnsc.2018.3174. Keywords: sequential auctions online advertising online learning stochastic optimization stochastic approximation incomplete information regret analysis dynamic games 1. Introduction Online ad spending has grown dramatically in recent years, reaching more than $70 billion in the United States in 2016 (eMarketer 2016b). A substantial part of this spend ing is generated through auction platforms where adver- tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange platforms, such as Googles DoubleClick and Yahoos RightMedia, and more centralized platforms, such as Facebook Exchange and Twitter Ads, as well as spon- sored search platforms, such as Googles AdWords and Microsoft s Bing Ads. In such markets, advertisers may participate in many thousands of auctions per day, whereby competitive interactions are interconnected by budgets that limit the total expenditures of advertisers throughout their campaign. The complexity of this competition, to- gether with the frequency of opportunities and the time scale on which decisions are made, are key drivers in the rise of automated bidding algorithms that have become common practice among advertisers (eMarketer 2016a). One operational challenge that is fundamental for man- aging budgeted campaigns in online ad markets is to balance present and future bidding opportunities effec- tively. For example, advertisers face the analytical and computational challenge of timing the depletion of their budget with the planned termination of their campaign because running out of budget before the campaigns planned termination (or alternatively, reaching the end of the campaign with unutilized funds) may result in sig- nicant opportunity loss. The challenge of accounting for future opportunities throughout the bidding process has received increasing attention in the online advertising industry: platforms now recommend advertisers to pacethe rate at which they spend their budget and provide budget-pacing services that are based on a variety of heuristics. 1 Budget pacing is particularly challenging owing to the many uncertainties that characterize the competitive landscape in online ad markets. First, although adver- tisers can evaluate, just before each auction, the value of 3952

Transcript of Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place...

Page 1: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

MANAGEMENT SCIENCEVol. 65, No. 9, September 2019, pp. 3952–3968

http://pubsonline.informs.org/journal/mnsc ISSN 0025-1909 (print), ISSN 1526-5501 (online)

Learning in Repeated Auctions with Budgets: Regret Minimizationand EquilibriumSantiago R. Balseiro,a Yonatan Gurb

aGraduate School of Business, Columbia University, New York, New York 10027; bGraduate School of Business, Stanford University, Stanford,California 94305Contact: [email protected], http://orcid.org/0000-0002-0012-3292 (SRB); [email protected],

http://orcid.org/0000-0003-0764-3570 (YG)

Received: February 21, 2017Revised: January 9, 2018; July 5, 2018Accepted: July 10, 2018Published Online in Articles in Advance:August 1, 2019

https://doi.org/10.1287/mnsc.2018.3174

Copyright: © 2019 INFORMS

Abstract. In online advertising markets, advertisers often purchase ad placements throughbidding in repeated auctions based on realized viewer information. We study how budget-constrained advertisers may compete in such sequential auctions in the presence of un-certainty about future bidding opportunities and competition.We formulate this problem asa sequential game of incomplete information, in which bidders know neither their ownvaluation distribution nor the budgets and valuation distributions of their competitors. Weintroduce a family of practical bidding strategies we refer to as adaptive pacing strategies, inwhich advertisers adjust their bids according to the sample path of expenditures they exhibit,and analyze the performance of these strategies in different competitive settings. We es-tablish the asymptotic optimality of these strategies when competitors’ bids are independentand identically distributed over auctions, but also when competing bids are arbitrary. Whenall the bidders adopt these strategies, we establish the convergence of the induced dynamicsand characterize a regime (wellmotivated in the context of online advertisingmarkets) underwhich these strategies constitute an approximateNash equilibrium indynamic strategies: thebenefit fromunilaterally deviating to other strategies, including oneswith access to completeinformation, becomes negligible as the number of auctions and competitors grows large. Thisestablishes a connection between regret minimization and market stability, by which ad-vertisers can essentially follow approximate equilibrium bidding strategies that also ensurethe best performance that can be guaranteed off equilibrium.

History: Accepted by Noah Gans, stochastic models and simulation.Supplemental Material: Supplemental material is available at https://doi.org/10.1287/mnsc.2018.3174.

Keywords: sequential auctions • online advertising • online learning • stochastic optimization • stochastic approximation •incomplete information • regret analysis • dynamic games

1. IntroductionOnline ad spending has grown dramatically in recentyears, reaching more than $70 billion in the United Statesin 2016 (eMarketer 2016b). A substantial part of this spending is generated through auction platforms where adver-tisers sequentially submit bids to place their ads at slotsas these become available. Important examples includead exchange platforms, such as Google’s DoubleClick andYahoo’s RightMedia, and more centralized platforms, suchas Facebook Exchange and Twitter Ads, as well as spon-sored search platforms, such as Google’s AdWords andMicrosoft’s Bing Ads. In such markets, advertisers mayparticipate inmany thousands of auctionsperday,wherebycompetitive interactions are interconnected by budgetsthat limit the total expenditures of advertisers throughouttheir campaign. The complexity of this competition, to-gether with the frequency of opportunities and the timescale on which decisions are made, are key drivers in therise of automated bidding algorithms that have becomecommon practice among advertisers (eMarketer 2016a).

One operational challenge that is fundamental for man-aging budgeted campaigns in online ad markets is tobalance present and future bidding opportunities effec-tively. For example, advertisers face the analytical andcomputational challenge of timing the depletion of theirbudget with the planned termination of their campaignbecause running out of budget before the campaign’splanned termination (or alternatively, reaching the end ofthe campaign with unutilized funds) may result in sig-nificant opportunity loss. The challenge of accounting forfuture opportunities throughout the bidding process hasreceived increasing attention in the online advertisingindustry: platforms now recommend advertisers to“pace” the rate at which they spend their budget andprovide budget-pacing services that are based on a varietyof heuristics.1

Budget pacing is particularly challenging owing to themany uncertainties that characterize the competitivelandscape in online ad markets. First, although adver-tisers can evaluate, just before each auction, the value of

3952

Page 2: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

the current opportunity (in terms of, e.g., likelihood ofpurchase), advertisers do not hold such informationabout future opportunities thatmay be realized through-out the campaign. Second, advertisers typically knowvery little about the extent of the competition they face inonline ad markets. They typically do not know thenumber of advertisers that will bid in each auction, orimportant characteristics of their competitors, such astheir budgets and their expectations regarding futureopportunities. In addition, there is uncertainty about thevarying levels of competitors’ strategic and technicalsophistication: whereas some firms devote considerableresources to develop complex algorithms that aredesigned to identify and respond to strategies of com-petitors in real time to maximize their campaign utility,other, less resourceful firms may adopt simple biddingstrategies that may be independent of idiosyncratic userinformation, and that may even seem arbitrary.

The main questions we address in this paper arethese: (i) How should budget-constrained advertiserscompete in repeated auctions under uncertainty? (ii)What type of performance can be guaranteed with andwithout typical assumptions on competitors’ behavior?(iii) Can a single class of strategies constitute an eq-uilibrium (when adopted by all advertisers) whileachieving the best performance that can be guaranteedoff equilibrium? These questions are of practical im-portance and draw attention from online advertisersand ad platforms.

1.1. Main ContributionThe main contribution of the paper lies in introducinga new family of practical and intuitive bidding strategiesthat dynamically adapt to uncertainties and competitionthroughout an advertising campaign only on the basis ofobserved expenditures, and in analyzing the performanceof these strategies off equilibrium and in equilibrium. Inmore detail, our contribution is along the following lines.

1.1.1. Formulation and Solution Concepts. We for-mulate the budget-constrained competition betweenadvertisers as a sequential game of incomplete in-formation, in which advertisers know neither their ownvaluation distribution nor the budgets and valuationdistributions of their competitors. To evaluate the per-formance of a given strategy without any assumptionson competitors’ behavior, we quantify the portion itcan guarantee of the performance of the best (feasible)dynamic sequence of bids one could have selected withthe benefit of hindsight. The proposed performancemetric, referred to as γ-competitiveness, extends that ofno-regret strategies (commonly used in unconstrainedsettings) to a more stringent benchmark that allows fordynamic sequences of actions subject to global con-straints. By showing that the maximal portion that canbe guaranteed is, in general, less than one, we establish

the impossibility of a “no-regret” notion relative to adynamic benchmark.

1.1.2. Adaptive Pacing Strategies. We introduce a newclass of practical bidding strategies that adapt thebidding behavior of advertisers throughout the cam-paign, solely on the basis of the sample path of ex-penditures they exhibit. We refer to this class as adaptivepacing strategies, because these strategies dynamicallyadjust the “pace” at which the advertiser spends itsbudget. This pace is controlled by a single parameter:a dual multiplier that determines the extent to whichthe advertiser bids below its true values. This multiplieris updated according to an intuitive primal–dual schemethat extends single-agent learning ideas for the purposeof balancing present and future opportunities in un-certain environments. In particular, these strategies aimat maximizing instantaneous payoffs while guarantee-ing that the advertiser depletes its budget “close” to theend of the campaign.

1.1.3. Performance and Stability. We analyze the per-formance of adaptive pacing strategies under differentassumptions on the competitive environment. On theone hand, when competitors’ bids are independent andidentically distributed over auctions (we refer to thissetting as stationary competition), the performance ofthis class of strategies converges, in a long-run averagesense, to the best performance attainable with the ben-efit of hindsight. On the other hand, when competitors’bids are arbitrary (we refer to this setting as arbitrarycompetition), we establish, through matching lowerand upper bounds on γ-competitiveness, the asymp-totic optimality of these strategies; that is, no otherstrategy can guarantee a larger portion of the best per-formance in hindsight. Together, these results establishperformance guarantees and asymptotic optimalityin both the “optimistic” and “pessimistic” scenarios.When all bidders follow adaptive pacing strategies,competing bids are nonstationary and endogenous. Weshow that in such a scenario the sequences of dualmultipliers (and the resulting payoffs) converge, andcharacterize a regime (that is well grounded in practice)under which these strategies constitute an approximateNash equilibrium: the benefit from unilaterally deviat-ing to any fully informed dynamic strategy diminishes tozero as the number of auctions and themarket size growlarge, even when advertisers know their value distri-bution as well as the value distributions and budgets oftheir competitors up front, and even when they canacquire real-time information on the past value realiza-tions, bids, and payoffs of their competitors.Notably, without using any prior knowledge of the

competitive setting (that is, whether it is stationary,arbitrary, or of simultaneous learning), these strate-gies asymptotically attain the best performance that is

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3953

Page 3: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

achievablewhen advertisers have ex ante knowledge ofthe nature of the competition. Our results (summarizedin Table 1) establish a connection between performanceand market stability in our setting: advertisers canfollow bidding strategies that constitute an approxi-mate equilibrium and that, at the same time, asymp-totically ensure the best performance that can beguaranteed off equilibrium.

To assess the value that may be captured in practiceby adaptive pacing strategies, we show empirically,using data from a large online ad auction platform, thatthese strategies often achieve performance that is veryclose to the best performance that could have beenachieved in hindsight.

1.2. Related WorkOur study lies in the intersection of marketing, econo-mics, operations, and computer science. Although wefocus on the broad application area of online advertis-ing, our analysis includes methodological aspects oflearning, games of incomplete information, and auctionstheory.

1.2.1. Sequential Games of Incomplete Information.There is a rich literature stream on the conditions underwhich observation-based dynamics may or may notconverge to an equilibrium. For an extensive overviewof related work, which excludes budget considerationsandmostly focuses on discrete action sets and completeinformation, see the books of Fudenberg and Levine(1998) and Hart and Mas-Colell (2013). Although con-vergence to equilibrium has appealing features, it gen-erally does not imply any guarantees on performancealong the strategy’s decision path. One important notionthat has been used to evaluate performance along thedecision path in many sequential settings is Hannanconsistency (Hannan 1957), also referred to as universalconsistency and, equivalently, no-regret. A strategy isHannan consistent if, when competitors’ actions are ar-bitrary, it guarantees a payoff at least as high as anystatic action that could have been selected in hindsight.An important connection between Hannan consistencyand convergence to equilibrium is established by Hart

andMas-Colell (2000), who provide a no-regret strategythat, when adopted by all players, induces an empir-ical distribution of play that converges to a correlatedequilibrium.In the present paper we advance similar ideas by

establishing a connection between performance andequilibrium in a setting with global constraints that limitthe sequence of actions players may take. Because insuch a setting repeating any single action may be in-feasible as well as generate poor performance, we re-place the static benchmark that appears in Hannanconsistency with the best dynamic and feasible sequenceof actions. We characterize the maximal portion of thisdynamic benchmark that may be guaranteed, providea class of strategies that achieve this portion, anddemonstrate practical settings in which these strategiesconstitute an ε-Nash equilibrium among fully informeddynamic strategies. This equilibrium notion is partic-ularly ambitious under incomplete information be-cause it involves look-ahead considerations and is hardtomaintain even inmore information-abundant settings(see, e.g., Brafman and Tennenholtz 2004 and Ashlagiet al. 2012).

1.2.2. Learning and Equilibrium in Sequential Auctions.Our work relates to other papers that study learningin various settings of repeated auctions. Iyer et al.(2014) adopt a mean-field approximation to studyrepeated auctions in which bidders learn about theirown private value over time. Weed et al. (2016) considera sequential auction setting, where bidders do not knowthe associated intrinsic value, and adopt multiarmedbandit policies to analyze the underlying exploration–exploitation trade-off. Bayesian learning approachesare analyzed, for example byHon-Snir et al. (1998) in thecontext of sequential first-price auctions, as well as byHan et al. (2011) undermonitoring and entry costs. Thesestudies address settings significantly different fromours and, in particular, do not consider budget con-straints. In our setting, items are sold via repeated second-price auctions, and bidders observe current privatevalues when bidding (but are uncertain about futurevalues). Although truthful bidding is weakly dominantin a static second-price auction, a budget-constrainedbidder needs to learn and account for the (endogenous)option value of future auctions, which depends directlyon its budget, together with unknown factors, such asits value distribution as well as the budgets, valuedistributions, and strategic behavior of its competitors.

1.2.3. Repeated Auctions with Budgets. Balseiro et al.(2015) introduce a fluid mean-field approximation tostudy the outcome of the strategic interaction betweenbudget-constrained advertisers in a complete informa-tion setting. They show that stationary strategies thatshade bids using a constant multiplier can constitute

Table 1. Performance Characteristics of Adaptive PacingStrategies

Nature of competition

Stationary ArbitrarySimultaneously

learning

Long-run averageconvergence to thebest performancein hindsight

Long-run averageconvergence to thelargest attainableportion of the bestperformance inhindsight

Convergence inmultipliers and inperformance;approximateequilibrium inlarge markets

Balseiro and Gur: Learning in Repeated Auctions with Budgets3954 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS

Page 4: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

an approximate Nash equilibrium, but these strate-gies rely on the value distributions and the budgetconstraints of all the advertisers. The present paperfocuses on the practical challenge of bidding in theserepeated auctions and, in particular, when such broadinformation on the entire market is not available to thebidder, and when competitors do not necessary followan equilibrium strategy. Although we do not impose afluid mean-field approximation in the present paper,under such an approximation our proposed strategiescould be viewed as converging under incomplete in-formation to an equilibrium in fluid-based strategies.In a relatedwork, Conitzer et al. (2018) introduce a pacingequilibrium concept that also relies on complete in-formation. They provide evidence through numericalexperiments that the adaptive pacing strategies introducedhere would converge to an equilibrium in their settingas well. Convergence under realistic (incomplete) in-formation of the simple dynamics introduced here tothe different notions of equilibria mentioned above pro-vides some support to these equilibria as solutionconcepts. A few other budget management approacheshave been recently considered in the literature; see, forexample, Charles et al. (2013), Karande et al. (2013), andBalseiro et al. (2017).

Zhou et al. (2008) study budget-constrained biddingfor sponsored search in an adversarial setting and pro-vide an algorithm with a competitive ratio dependingon upper and lower bounds on the value-to-weightratio. Our results are not directly comparable to theirs.Considering a stationary setting, Badanidiyuru et al.(2013) study multiarmed bandit problems in which thedecision maker may be resource-constrained and dis-cuss applications to bidding in repeated auctions withbudgets whereby arms correspond to possible bids. Inour paper the information structure is different be-cause advertisers observe their values before bidding.Our paper also relates to the literature on contextualmultiarmed bandit problems with resource constraints,in which the context corresponds to the impression value(see, e.g., Badanidiyuru et al. 2014). The algorithm ofBadanidiyuru et al. (2014) can be applied to our settingby discretizing the action space, with performanceguarantees that are worse than the guarantees obtainedin the present paper. Borgs et al. (2007) study heuristicsfor bidding in repeated auctions in which advertiserssimultaneously update bids on the basis of past cam-paigns and establish convergence across campaigns. Bycontrast, we study strategies that converge throughouta single campaign. Finally, Jiang et al. (2014) analyzenumerically an approximation scheme that is similar toours, but from the perspective of a single agent that bidsin the presence of an exogenous and stationary market.To the best of our knowledge, the present paper is thefirst to propose a class of practical budget-pacing strat-egies with performance guarantees in stationary and

adversarial settings, and concrete notions of optimalityunder competition.

1.2.4. Stochastic and Convex Optimization. The bid-ding strategies we suggest are based on approximatinga Lagrangian dual objective by constructing and fol-lowing subgradient estimates in the dual space. In thisrespect our strategies relate to the class of mirror de-scent schemes introduced by Nemirovski and Yudin(1983); see also Beck and Teboulle (2003) and chapter 2of Shalev-Shwartz (2012), as well as the potential-basedgradient descent method in chapter 11 of Cesa-Bianchiand Lugosi (2006). More broadly, these strategies be-long to the class of stochastic approximation methodsthat have been widely studied and applied in a varietyof areas; for a comprehensive review see Benvenisteet al. (1990), Lai (2003), and Kushner and Yin (2003), aswell as Araman and Caldentey (2011) for more recentapplications in revenuemanagement. This strand of theliterature focuses on single-agent learning methods foradjusting to uncertain environments that are exogenousand stationary. Exceptions that consider the performanceof stochastic approximation schemes in nonstationaryenvironments include the formulations in section 3.2 ofKushner and Yin (2003) and chapter 4 of Benveniste et al.(1990), as well as Besbes et al. (2015). Notably, theseframeworks consider underlying environments thatchange exogenously, and decision makers that operateas monopolists. In other related works, Rosen (1965)demonstrates finding an equilibrium in concave gamesby applying a gradient-based method in a centralizedmanner with complete information on the payoff func-tions, and Nedic and Ozdaglar (2009) study the con-vergence of distributed gradient-basedmethods adoptedby cooperative agents.The present paper contributes to this literature by ex-

tending, in two key aspects, single-agent learning ideasin a setting of practical importance. First, our proposedstrategies use expenditure observations to constructsubgradient estimates period by period, each time fora different component of the dual objective we approxi-mate. This facilitates efficient learning throughout thecampaign, rather thanbetween campaigns. Second, theproposed strategies are analyzed in settings that in-clude both uncertainty and competition. In particular,when these strategies are adopted by all advertisers,the resulting environment is both endogenous andnonstationary.Our performance analysis in the presence of arbi-

trary competition relates to the literature initiated byZinkevich (2003) on online convex optimization in anadversarial framework, where underlying cost functionscan be selected at any point in time by an adversary,depending on the actions of the decision maker. (Thisconstitutes a more pessimistic environment than thetraditional stochastic setting, where a cost function is

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3955

Page 5: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

picked a priori and held fixed.) For an overview ofrelated settings, see chapter 2 of Shalev-Shwartz (2012).

1.2.5. Adaptive Algorithms. The challenge of designingstrategies that adapt to unknown characteristics of theenvironment (in the sense of achieving ex post per-formance that is essentially as good as the performancethat can be achieved when advertisers have ex anteknowledge of the nature of the competition) dates backto studies in the statistics literature (see Tsybakov 2008and references therein) and has seen recent interest inmachine learning. Examples include Mirrokni et al.(2012), who seek to design an algorithm for an onlinebudgeted allocation problem that achieves an optimalcompetitive ratio in both adversarial and stochasticsettings; Seldin and Slivkins (2014), who present an al-gorithm that achieves (near) optimal performance inboth stochastic and adversarial multiarmed bandit re-gimes without prior knowledge of the nature of theenvironment; and Sani et al. (2014), who consider anonline convex optimization setting and derive algo-rithms that are rate optimal regardless of whether thetarget function is weakly or strongly convex.

2. Incomplete Information ModelWemodel a sequential game of incomplete informationwith budget constraints, where K risk-neutral hetero-geneous advertisers repeatedly bid to place ads throughsecond-price auctions. We note that many of our mod-elingassumptions canbegeneralizedandaremadeonlyto simplify exposition and analysis; some generaliza-tions are discussed in Section 2.1.

In each time period t � 1, . . . ,T there is an availablead slot that is auctioned. The auctioneer (i.e., theplatform) first shares with the advertisers some in-formation about a visiting user and then runs a second-price auction with a reserve price of zero to determinewhich ad to show to the user. The information pro-vided by the auctioneer heterogeneously affects thevalue advertisers perceive for the impression on thebasis of their targeting criteria. The values advertisersassign to the impression they bid for at time t aredenoted by the random vector vk,t

( )Kk�1 and are assumed

to be independently distributed across impressions andadvertisers, with a support over ∏K

k�1 0, v̄k[ ] ⊂ RK+. Wedenote the marginal cumulative distribution functionof vk,t by Fk and assume that Fk is absolutely continuouswith bounded density fk. Advertiser k has a budget Bk,which limits the total payments that can bemade by theadvertiser throughout the campaign. We denote byρk :�Bk/T the target expenditure (per impression) rateof advertiser k. We assume that 0<ρk ≤ v̄k for eachadvertiser k ∈ 1, . . . ,K{ }; otherwise the problem wouldbe trivial because the advertiser would never deplete itsbudget. Each advertiser k is therefore characterized bya type θk :� (Fk, ρk).

In each period t each advertiser privately observesa valuation vk,t and then posts a bid bk,t. Given thebidding profile (bk,t)Kk�1 ∈ RK+, we denote by dk,t thehighest competing bid faced by advertiser k:

dk,t :� maxi : i��k

bi,t{ }

.

We assume that advertisers have a quasilinear utilityfunction given by the difference between the sum of thevaluations generated by the impressions won and theexpenditures corresponding to the second-price rule.We denote by zk,t the expenditure of advertiser k attime t:

zk,t :� 1{dk,t ≤ bk,t}dk,t,and by uk,t :� 1{dk,t ≤ bk,t}(vk,t − dk,t) the correspondingnet utility.2 After the bidding takes place, the auc-tioneer allocates the ad to the highest bidder, and eachadvertiser k privately observes its own expenditure zk,tand net utility uk,t.

2.1. Information Structure and Admissible Budget-Feasible Bidding Strategies

We assume that at the beginning of the horizon adver-tisers have no information on their valuation distribu-tions, nor on their competitors’; that is, the distributionsFk : k � 1 . . . ,K{ } are unknown. Each advertiser knowsits own target expenditure rate ρk, as well as the lengthof the campaign T, but has no information on the bud-gets or the target expenditure rates of its competitors.In particular, each advertiser does not know the numberof competitors in themarket, their types, or its own type.We next formalize the class of (potentially random-

ized) admissible and budget-feasible bidding strate-gies. Let y be a random variable defined over a pro-bability space Y,=,Py

( ). We denote by *k,t the history

available at time t to advertiser k, defined by

*k,t :� σ vk,τ, bk,τ, zk,τ,uk,τ( )t−1

τ�1, vk,t, y( )

,

for any t ≥ 2, with *k,1 :� σ vk,1, y( )

. A bidding strategyfor advertiser k is a sequence of mappings β � (β1,β2, . . .), where the functions β1 : R+ × Y → R+ and βt :

R4(t−1)+1+ × Y → R+ for t � 2, . . . ,T map available his-

tories to bids bβk,t : t � 1, . . . ,T{ }

. A strategy is admissible,or nonanticipating, if it depends only on informationthat is available to the bidder; that is, if in each period tthe strategy βt is measurable with respect to the filtration*k,t. We say that β is budget feasible if it generates ex-penditures that are constrained by the available budget;that is,

∑Tt�1 1{dk,t ≤ bβk,t}dk,t ≤ ρkT for any realized vector

of highest competitors’ bids dk � (dk,t)Tt�1 ∈ RT+. We de-note by @k the class of admissible budget-feasiblestrategies for advertiser k with target expenditure rate

Balseiro and Gur: Learning in Repeated Auctions with Budgets3956 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS

Page 6: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

ρk, and by @ � @1 × · · · ×@K the product space ofstrategies for all advertisers.

2.2. Off-Equilibrium PerformanceTo evaluate the off-equilibrium performance of a candi-date strategy β ∈ @k without any assumptions on thecompetitors’ strategies, we define the sample-path per-formance given vectors of realized values vk � (vk,t)Tt�1 ∈[0, v̄k]T and realized competing bids dk as follows:

πβk(vk;dk) :�Eβ

∑Tt�1

1{dk,t ≤ bβk,t}(vk,t − dk,t)[ ]

,

where the expectation is taken with respect to any ran-domness embedded in the strategy.3

We quantify theminimal loss a strategy can guaranteerelative to any sequence of bids one could have madewith the benefit of hindsight. Given sequences of re-alized valuations vk and highest competing bids dk, wedenote by πH

k (vk;dk) the best performance advertiser kcould have achieved with the benefit of hindsight:

πHk (vk;dk) :� max

xk∈{0,1}T∑Tt�1

xk,t(vk,t − dk,t) (1)

s.t.∑Tt�1

xk,tdk,t ≤ Tρk ,

where the binary variable xk,t indicates whether theauction at time t is won by advertiser k. The solution of(1) is the best dynamic response (sequence of bids thatcould have been made) against a fixed environmentcharacterized by vk and dk. Notably, if vk and dk aresuch that the bidder is effectively unconstrained (that is,if

∑Tt�1 1{dk,t ≤ vk,t}dk,t ≤ ρkT), then πH

k is achieved bytruthful bidding (i.e., bk,t � vk,t); otherwise, πH

k is thesolution to a knapsack problem (for an overview ofknapsack problems see, e.g., Martello and Toth 1990).Although the identification of the best sequence of bidsin hindsight does not account for the potential com-petitive reaction to that sequence, this benchmark istractable and has appealing practical features. For ex-ample, the best performance in hindsight can be com-puted using only historical data and requires nobehavioral assumptions on competitors (see, e.g.,Talluri and van Ryzin 2004, chapter 11.3).

Nonanticipating strategies might not be able toachieve or approach πH

k . For some γ ∈ [1,∞), a biddingstrategy β ∈ @k is said to be asymptotic γ-competitive(Borodin and El-Yaniv 1998) if

lim supT→∞,Bk�ρkT

supvk∈ 0,v̄k[ ]Tdk∈RT+

1T

(πHk (vk;dk) − γπ

βk (vk;dk)

)≤ 0 . (2)

An asymptotic γ-competitive bidding strategy (asymp-totically) guarantees a portion of at least 1/γ of the

performance of any dynamic sequence of bids thatcould have been selected in hindsight.4

2.3. Equilibrium Solution ConceptGiven a strategy profile β � (βk)

Kk�1 ∈ @, we denote by

Πβk the total expected payoff for advertiser k:

Πβk :�Eβ

v

∑Tt�1

1{dβ−kk,t ≤ bβkk,t}(vk,t − dβ−kk,t )[ ]

,

where the expectation is taken with respect to anyrandomness embedded in the strategies β and therandom values of all the advertisers, and where wedenote the highest competing bid induced by compet-itors’ strategies β−k � βi

( )i��k by dβ−kk,t � maxi : i �� k

{bβii,t

}. We

say that a strategy profile β ∈ @ constitutes an ε-Nashequilibrium in dynamic strategies if each player’s in-centive to unilaterally deviate to another strategy is atmost ε; that is, if for all k:

1T

supβ̃∈@CI

k

Πβ̃,β−kk −Π

βk

⎛⎜⎜⎜⎜⎜⎜⎜⎝⎞⎟⎟⎟⎟⎟⎟⎟⎠ ≤ ε, (3)

where @CIk ⊇ @k denotes the class of strategies with

complete information on market primitives. Althoughamore precise description of this class will be providedin Section 5, we note that this class also includes strat-egies with access to complete information on the types(θi)Ki�1 and on the bids made in past periods by alladvertisers. Naturally, the solution concept in (3) takesinto account the strategic response of the competitiveenvironment to the actions of an advertiser.

2.4. Discussion of Model AssumptionsAlthough we focus on second-price auctions, our resultshold for all deterministic dominant-strategymechanismsthat are incentive compatible and individually rationalwith nonnegative transfers. This broad class includesmechanisms such as second-price auctions with anon-ymous reserve prices, second-price auctions with per-sonalized reserve prices, ranking mechanisms, andMyerson’s optimal mechanism. A question of theoreticaland practical interest is how to extend the adaptive pacingapproach that is suggested in this paper to nontruthfulmechanisms such as first-price auctions.Predicated on the common practice of running cam-

paigns with daily budgets, we focus on synchronouscampaigns that start and finish simultaneously. Thesynchronicity of the campaigns allows one to capturesome key market features, while providing the tracta-bility required to transparently highlight key problemcharacteristics. We note that the results in Section 3.3that are established under arbitrary competition areagnostic to this assumption and hold even when cam-paigns start and finish at different times. An interesting

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3957

Page 7: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

research direction is to extend our convergence andequilibrium results to models with asynchronous cam-paigns, such as those in Iyer et al. (2014) and Balseiroet al. (2015).

Our model assumes that values are independentacross time and advertisers and absolutely continuous.The assumption that values are independent acrosstime is frequently used in the literature and is based ontypical browsing behavior of internet users that leads toweak intertemporal correlations. Our model can berelaxed to accommodate correlation across advertisers’values as long as certain technical conditions (such assmoothness of the dual objective function and strongmonotonicity of the expenditure function) hold. Al-though we provide sufficient conditions that rely onvalues being independent, these conditions can beshown to hold when values are imperfectly correlatedand when distributions have atoms at zero (capturingthe possibility of advertisers not participating in everyauction). We discuss the absolute continuity of valuesin Section 6.

In our model the number of impressions is knownto all advertisers. In practice advertisers may infer thelength of their campaign (in terms of total number ofimpressions) before its beginning, using user trafficstatistics commonly provided by publishers and ana-lytics companies. Nevertheless, our model can accom-modate a random number of impressions throughdummy arrivals that are valued at zero by the advertiser.

3. Off-EquilibriumPerformance Guarantees

In this section we study the performance that can beguaranteed by bidding strategies with and withoutassumptions on the primitives (budgets, valuations)and strategies that govern competitors’ bids. Weobtain a lower bound on the minimal loss one mustincur relative to the best dynamic response in hindsightunder arbitrary competition. We then introduce a classof adaptive pacing strategies that react to realizedvaluations and expenditures throughout the campaignand obtain an upper bound on the performance of thesestrategies under arbitrary and stationary competitiveenvironments, thereby establishing asymptotic opti-mality in both settings.

3.1. Lower Bound on the AchievableGuaranteed Performance

Before investigating the type of performance an ad-vertiser can aspire to, we first formalize the type ofperformance that cannot be achieved. The followingresult bounds the minimal loss one must incur relativeto the best dynamic (and feasible) sequence of bids thatcould have been selected in hindsight, under arbitrarycompetition.

Theorem 1 (Lower Bound on the Achievable GuaranteedPerformance). For any target expenditure rate ρk ∈ 0, v̄k( ]and number γ< v̄k/ρk there exists some constant C> 0 suchthat for any bidding strategy β ∈ @k:

lim supT→∞,Bk�ρkT

supvk∈ 0,v̄k[ ]Tdk∈RT+

1T

(πHk (vk;dk) − γπ

β

k (vk;dk))

≥ C .

Theorem 1 establishes that no admissible strategy canguarantee asymptotic γ-competitiveness for any γ<v̄k/ρk and therefore cannot guarantee a portion largerthan ρk/v̄k of the total net utility achieved by a dynamicresponse that is taken with the benefit of hindsight.Because an advertiser should never pay more than itsvalue, v̄k is the maximal possible expenditure and, thus,the ratio ρk/v̄k captures the relative “wealthiness” of theadvertiser. This implies that bidding strategies mayperformpoorly relative tobestperformance inhindsightwhen budgets are small, but better performance mightbe guaranteed when budgets are bigger (when ρk ≥ v̄kthe advertiser can bid truthfully to guarantee optimalperformance and 1-competitiveness). Thus, Theorem 1establishes the impossibility of ”no-regret” relative todynamic sequences of bids, by showing that, in general,the performance of any learning strategy cannot beguaranteed to approach πH

k .We next describe the main ideas in the proof of the

theorem. We first observe, by adapting Yao’s principle(Yao 1977) to our setting, that to bound the worst-caseloss of any (deterministic or not) strategy relative to thebest response in hindsight, it suffices to analyze theexpected loss of deterministic strategies relative to thatbenchmark, where the sequence of competing bids isdrawn from a certain distribution (see Lemma A.1 inthe online supplemental material). We next illustratethe idea behind the worst-case instance we construct.We assume that the advertiser knows in advance thatthe sequence of highest competing bids will be ran-domly selected from the set d1,d2

{ }, where

d1 �(dhigh, . . . , dhigh︸�����︷︷�����︸

τ auctions

, v̄k, . . . . . . . , v̄k︸�����︷︷�����︸T−τ auctions

),

d2 �(dhigh, . . . , dhigh︸�����︷︷�����︸

τ auctions

, dlow, . . . , dlow︸�����︷︷�����︸T−τ auctions

)

for some v̄k ≥ dhigh > dlow > 0, and that values will be v̄kthroughout the campaign. We establish that in this caseone may restrict analysis to strategies that deter-mine before the beginning of the campaign how manyauctions to win at different stages of the campaign (seeLemma A.2). This presents the advertiser with thefollowing trade-off: whereas early auctions introducea return per unit of budget that is certain but low, laterauctions introduce a return per unit of budget that may

Balseiro and Gur: Learning in Repeated Auctions with Budgets3958 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS

Page 8: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

be higher but may also decrease to zero. Because thenumber of auctions in the first stage, denoted by τ, maybe designed to grow with T, this trade-off may drivea loss that does not diminish to zero asymptotically.The proof of the theorem presents a more generalconstruction that follows the ideas illustrated above andtunes the structural parameters to maximize the worst-case loss that must be incurred by any strategy relativeto the best dynamic response in hindsight. Notably,because the above construction assumes that the ad-vertiser’s values are known and fixed to v̄k, Theorem 1implies that, in general, no strategy can achieve a com-petitive ratio lower than v̄k/ρk (and, therefore, di-minishing regret) even if advertisers know their ownvaluations.

3.2. Asymptotic Optimal Bidding StrategyWe introduce an adaptive pacing strategy that adjustsbids according to observations. In what follows wedenote by P[a,b](x) � min{max{x, a}, b} the Euclideanprojection operator on the interval [a, b].Adaptive Pacing Strategy (A). Input: a number εk > 0.

1. Select an initial multiplier µk,1 in 0, µ̄k

[ ], and set the

remaining budget to B̃k,1 � Bk � ρkT.2. For each t � 1, . . . ,T:

a. Observe the realization of the random valua-tion vk,t, and post a bid:

bAk,t � min{vk,t/(1 + µk,t), B̃k,t

}.

b. Observe the expenditure zk,t. Update the multi-plier by

µk,t+1 � P[0,µ̄k](µk,t − εk (ρk − zk,t))and the remaining budget by B̃k,t+1 � B̃k,t − zk,t.

The adaptive pacing strategy dynamically adjuststhe pace at which the advertiser depletes its budget byupdating a multiplier that determines the extent towhich the advertiser bids below its true values (shadesbids). The strategy consists of a sequential approxi-mation scheme that takes place in the Lagrangian dualspace and that is designed to approximate the bestsolution in hindsight defined in (1). The objective ofthe Lagrangian dual of (1) is

∑Tt�1 xk,t(vk,t − (1 + µ)dk,t) +

µρk, where µ is a nonnegative multiplier capturing theshadow price associated with the budget constraint. Fora fixed µ, the dual objective is maximized by winningall items with vk,t ≥ (1 + µ)dk,t, which is obtained bybidding bk,t � vk,t/(1 + µ), because advertiser k winswhenever bk,t ≥ dk,t. Therefore, one has

πHk (vk;dk) ≤ inf

µ≥0∑Tt�1

(vk,t − (1 + µ)dk,t)+ + µρk︸�������������︷︷�������������︸�:ψk,t(µ)

, (4)

where we denote by y+ � max(y, 0) the positive partof a number y ∈ R. The tightest upper bound can be

obtained by solving the minimization problem on theright-hand side of (4). Because this problem cannot besolved without prior information on all the values andcompeting bids throughout the campaign, the proposedbidding strategy approximates (4) by estimating a mul-tiplier µk,t and bidding bAk,t � vk,t/(1 + µk,t) in each round,as long as the remaining budget suffices.5

The dual approximation scheme consists of estimat-ing a direction of improvement and following thatdirection. More precisely, the sequence of multipliersfollows a subgradient descent scheme, using the noisypoint access each advertiser has to ∂−ψk,t(µk,t), namely,the left derivative of the t-period component of thedual objective in (4). In other words, in each period t �1, 2 . . ., one has

∂−ψk,t(µk,t) � ρk − dk,t1{vk,t ≥ (1 + µk,t)dk,t}� ρk − dk,t1{bAk,t ≥ dk,t} � ρk − zk,t.

In each period t the advertiser compares its expenditurezk,t to the target expenditure rate ρk that is “affordable”given the initial budget. Whenever the advertiser’sexpenditure exceeds the target expenditure rate, µk,t isincreased by εk(zk,t − ρk), implying that the shadowprice increases and that the associated budget con-straint becomes more binding. On the other hand, if theadvertiser’s expenditure is lower than the target ex-penditure rate (including periods in which the ex-penditure is zero) the multiplier µk,t is decreased byεk(ρk − zk,t), implying that the shadow price decreasesand that the associated budget constraint becomes lessbinding. Then, given the valuation in that period, theadvertiser shades its value using the current estimatedmultiplier.The multiplier µk,t essentially captures a “belief”

advertiser k has at time t regarding the shadow priceassociated with its budget constraint. Notably, the“correct” shadowprice, reflecting the “correct” value offuture opportunities, depends on unknown charac-teristics such as the value distribution of the advertiser,as well as the value distributions, budgets, and strat-egies of its competitors. Rather than aiming at esti-mating these unknown factors, the proposed strategyaims at learning the best response to these in terms ofthis shadow price.

Assumption 1 (Appropriate Selection of Step Size). Thenumber εk satisfies 0< εk ≤ 1/v̄k, lim

T→∞ εk � 0, and limT→∞Tεk � ∞.

Assumption 1 adapts a standard step size conditionthat is common in the stochastic approximation liter-ature (see, e.g., section 1.1 in Kushner and Yin 2003) tostationary step sizes and characterizes the range ofrates at which step sizes should converge to zero (as afunction of the horizon length) to allow for the conver-gence of the scheme, while guaranteeing sufficient

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3959

Page 9: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

exploration. By doing so, this assumption prescribesa broad range for tuning the strategy’s step size ina manner that guarantees the asymptotic optimalityresults that follow.6 Even though Assumption 1 placessome restrictions on the step size selection, it stilldefines a broad class of step sizes; for example, εk �c/Tγ satisfies Assumption 1 for any constants 0< c ≤1/v̄k and 0<γ< 1.

3.3. Performance Under Arbitrary CompetitionThe following result characterizes the performance thatcan be guaranteed by an adaptive pacing strategyunder arbitrary competition.

Theorem 2 (Asymptotic Optimality Under ArbitraryCompetition). Let A be an adaptive pacing strategy witha step size εk satisfying Assumption 1 and µ̄k ≥ v̄k/ρk − 1.Then:

limsupT→∞,Bk�ρkT

supvk∈ 0,v̄k[ ]Tdk∈RT+

1T

(πHk (vk,dk)− v̄k

ρkπAk (vk,dk)

)�0 .

Moreover, when the step size εk is of order T−1/2, the con-vergence is at rate T−1/2.

Theorem 2 establishes that an adaptive pacingstrategy is asymptotic (v̄k/ρk)-competitive (Borodinand El-Yaniv 1998), guaranteeing at least a fractionρk/v̄k of the total net utility achieved by any dynamicresponse that could have been selected with thebenefit of hindsight. This establishes the asymptoticoptimality of the strategy, because by Theorem 1 noother admissible bidding strategy can guarantee a largerfraction of the best performance in hindsight. (Recallingthe worst-case instance used in Theorem 1, a largerfraction of the best performance in hindsight cannotbe guaranteed even when advertisers know their valu-ations up front.)

In the proof of the theorem, we first show that thetime at which the budget is depleted under theadaptive pacing strategy is “close” to T. We then ex-press the loss incurred by that strategy relative to thebest response in hindsight in terms of the value of lostauctions and bound the potential value of these auc-tions. Together, these analyses establish that there existsome positive constants C1,C2, and C3, independent ofT, such that for any dk ∈ RT+ and vk ∈ 0, v̄k[ ]T,

πHk (vk;dk) − v̄k

ρkπAk (vk,dk) ≤ C1 + C2

εk+ C3Tεk .

This allows one to evaluate the performance of thevarious step sizes and establishes asymptotical optimalitywhenever Assumption 1 holds (independently of theinitial multiplier µ1,k). In particular, selecting εk � cT−1/2with c � (C2/C3)1/2 leads to a convergence rate of T−1/2.

3.4. Performance Under Stationary CompetitionThe upper bound in Theorem 2 holds in every samplepath (because the adaptive pacing strategy is deter-ministic) and is extremely robust. For example, it holdsalso when campaigns can start and finish at differenttimes and under arbitrary sequences of valuations andcompeting bids, including sequences that are correlatedacross impressions or advertisers, as well as dynamicsequences that are based on knowledge about the bid-ding strategy and its realized path. However, theestablished performance guarantee could be viewedas conservative, or “pessimistic,” in the sense that it holdsunder any sequences of valuations and competing bids,including worst-case instances such as the one describedin the proof of Theorem 1. In particular, when the tar-get expenditure rate ρk is small relative to v̄k, theguaranteed performance is limited as well. In thissubsection we demonstrate that for any fixed targetexpenditure rate, the asymptotic optimality of anadaptive pacing strategy can also be carried over (withmuch better performance) to more “optimistic” settingsin which uncertainties are drawn independently froma stationary distribution. (See Iyer et al. 2014 and Balseiroet al. 2015 for discussions on the validity of assumingstationarity competition.)We assume that the pairs (vk,t, dk,t) are independently

drawn from some unknown stationary distribution.We denote by Ψk(µ):�∑T

t�1 Evk,t,dk,t (vk,t−(1+µ)dk,t)+[ ]+(

µρk) the expectation of the dual objective functiongiven in (4), with respect to values and competing bids.

Theorem 3 (Asymptotic Optimality Under StationaryCompetition). Suppose that in each period t the pair(vk,t, dk,t) is independently drawn from some stationarydistribution such that the dual function Ψk(µ) is thricedifferentiable with bounded derivatives and strongly con-vex with parameter λk > 0. Let A be an adaptive pacingstrategy with a step size εk satisfying Assumption 1, as wellas εk < 1/(2λk) and µ̄k ≥ v̄k/ρk − 1. Then,

lim supT→∞,Bk�ρkT

1TEvk ,dk πH

k (vk,dk) − πAk (vk,dk)

[ ] � 0 .

Moreover, when the step size εk is of order T−1/2, the con-vergence is at rate T−1/2.Theorem 3 establishes that when valuations and

competing bids are independent and identically drawnthroughout time, an adaptive pacing strategy is long-run average optimal, in the sense that for any fixedρk > 0, its average expected performance converges tothe average expected performance of the best sequenceof bids that could have been selected in hindsight.Together with Theorems 1 and 2, Theorem 3 estab-lishes the asymptotic optimality of the adaptive pac-ing strategy in both the “pessimistic” and “optimistic”

Balseiro and Gur: Learning in Repeated Auctions with Budgets3960 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS

Page 10: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

scenarios when advertisers have no prior knowledge ofwhether the nature of the competition is “stationary” or“arbitrary,” and that selecting εk ∼ T−1/2 guarantees aconvergence rate of T−1/2 in both frameworks. We leavethe characterization of rate optimality in these frame-works as an open problem.

Theorem 3 is based on the assumption that the dualfunction is thrice differentiable and strongly convex.The first condition is technical and is required to per-form Taylor series expansions, and the second condi-tion is a standard one guaranteeing that the multipliersemployed by an adaptive pacing strategy converge.Lemma C.1 in Online Supplemental Material Part Cshows that these conditions hold when values andcompeting bids are independent and absolutely con-tinuous with bounded densities and the valuationdensity is differentiable. The conditions in Theorem 3can also be shown to hold when competing bids are(imperfectly) correlated across auctions.

To prove the theorem we first upper bound theexpected performance in hindsight in terms of Ψ(µ*),where µ* minimizes the dual function. We then lowerbound the performance of the adaptive pacing strategyin terms of Ψ(µ*) by developing a second-order ex-pansion around µ* of the expected utility per auctionwhen the strategy uses a multiplier µk,t. This expansioninvolves the mean error E[µk,t − µ*] in the first-orderterm and the mean squared error E[(µk,t − µ*)2] in thesecond-order term. Using the update rule we controlthe cumulative sum of these errors to establish thatthere exist some positive constants C1, C2, and C3,independent of T, such that

Evk ,dk

[πHk (vk,dk) − πA

k (vk,dk)] ≤ C1 + C2

εk+ C3Tεk .

This establishes asymptotic optimality whenever thesequence of step sizes satisfies Assumption 1 and es-tablishes a convergence rate of T−1/2 under a step sizeselection of order T−1/2.

3.5. Empirical Proof of Concept Using AdAuction Data

To analyze the value that may be captured in practiceby an adaptive pacing strategy, we empirically evaluatethe portion this strategy recovers of the best perfor-mance in hindsight, using data from a large online adauction platform. The data set includes bids from 13,182sequential online ad auctions, in which a total of 2,923advertisers participated. The objective of the followinganalysis is to demonstrate the effectiveness of an adap-tive pacing strategy in a realistic yet simple setting. Inparticular, an optimal tuning of the strategy’s param-eters may depend on the available budget as well ascompetition characteristics. Obtaining the optimalcharacterization of these parameters is a challengingopen problem that we leave as future work.

3.5.1. Setup. For each advertiser k, we set the sequenceof valuations vk to be the sequence of recorded bidssubmitted by that advertiser (where v̄k is the highestrecorded bid of advertiser k) and the sequence dk to bethe sequence of highest bids submitted by other ad-vertisers.We randomly generated ρk ∈ [0, v̄k] accordingto a uniform distribution, and for each target expendi-ture rate we approximated πH

k (vk,dk) using a standardlinear programming relaxation. Note that whenever anadvertiser is effectively unconstrained, that is,

∑Tt�1 1{dk,t ≤

vk,t}dk,t ≤ρkT, the best performance in hindsight, πHk ,

can be implemented by truthful bidding. We thencomputed the performance of an adaptive pacing strat-egy πA

k and compared its performance with πHk . We

focused on the 10 bidders with the largest cumulativeexpenditure (together, these advertisers were re-sponsible for more than 98% of the overall expendi-ture), and for each of these advertisers we replicatedthe process 200 times. In all the instances we used aninitial multiplier µ1 � 0.5 and a step size ε�T−1/2 (thechoice of the initial multiplier had very limited impacton the performance).

3.5.2. Results and Discussion. The plot that appears atthe left-hand side of Figure 1 depicts the fractions of πH

k(as a function of ρk/v̄k) that were recovered by theadaptive pacing strategy. Notably, in all the instancesthe strategy significantly outperformed the payoffguarantees for an arbitrary competitive environment.Moreover, it approximately achieved πH

k , except whenthe target expenditure rate was particularly small. Theplot on the right-hand side of Figure 1 compares thefractions of πH

k that were recovered by the adaptivepacing strategy with those recovered by “truthful bid-ding,” a naı̈ve strategy that bids the advertiser’s valueuntil the budget is depleted (each dot corresponds tothe performance of one advertiser under one target ex-penditure rate). We plotted these quantities as a functionof Bk/B0

k , where Bk � ρkT is the available budget, andB0k �

∑Tt�1 1{dk,t ≤ vk,t}dk,t is the minimal budget under

which the advertiser is effectively unconstrained andtruthful bidding is optimal (the plot focuses on instanceswith Bk/B0

k ≤ 2). One may expect that whenever ρk < v̄kbut Bk/B0

k ≥ 1, any nonanticipating strategy wouldincur a loss (relative to truthful bidding) that is as-sociated with not having ex ante knowledge of beingeffectively unconstrained. However, in these cases theloss incurred by the adaptive pacing strategy is lessthan 1%. On the other hand, whenever Bk/B0

k < 1, theadvertiser is effectively constrained. On average, inthese cases the adaptive pacing strategy outperformstruthful bidding by 120%. We attribute the relativedeterioration in the empirical performance of adap-tive pacing for small values of ρk to the fact that, forfixed valuations and competing bids, a very small

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3961

Page 11: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

budget suffices for winning only a few auctions (undersuch conditions one could expect the ability of learningpolicies to compete with the best sequence of bids inhindsight to be limited). We further note that one canlikely improve the empirical performance of the strategyby customizing the selection of the tuning parametersfor different advertisers. Overall, these results demon-strate the practical effectiveness of an adaptive pacingstrategy even under moderate budgets and problemhorizons, as well as realistic bidding behavior that maybe subject to correlated preferences.

4. Convergence of Simultaneous LearningWe turn to study the dynamics that emerge when alladvertisers follow adaptive pacing strategies simulta-neously. We establish that the sequences of multipliersgenerated by the strategies converge to a tractable profileof multipliers and that the payoffs converge to thepayoffs achieved under such a profile.

4.1. A Candidate Profile of Dual MultipliersUnder simultaneous adoption of adaptive pacing strat-egies, each advertiser follows the subgradient ∂−ψk,t ·(µk,t) � ρk − zk,t. Therefore, if the induced sequencesof multipliers converge, one may expect the limitingprofile to consist of multipliers under which each ad-vertiser’s expected expenditure equals its target expen-diture (whenever its budget is binding). Denote theexpected expenditure per auction of advertiser k whenadvertisers shade bids according to a profile of multi-pliersµ ∈ RK byGk(µ) :�Ev [1{(1 + µk)dk ≤ vk}dk],where

dk � maxi : i��k{vi/(1 + µi)

}and the expectation is taken

with respect to the values vk ∼ Fk. Following the above

intuition, we consider the vector µ*, defined by thecomplementary conditions:

µ∗k ≥ 0 ⊥ Gk(µ*) ≤ ρk, ∀k � 1, . . . ,K, (5)

where ⊥ indicates a complementarity condition be-tween the multiplier and the expenditure; that is, atleast one condition must hold with equality. Intui-tively, each advertiser shades bids if its expectedexpenditure is equal to its target rate. Conversely, if anadvertiser’s target rate exceeds its expected expenditure,then it bids truthfully by setting the multiplier equalto zero.We next provide sufficient conditions for the unique-

ness of the vector µ*. The vector functionG : RK → RK issaid to be λ-strongly monotone over a set8 ⊂ RK if (µ −µ′)T(G(µ′) −G(µ)) ≥ λ‖µ − µ′‖22 forallµ,µ′ ∈ 8.Wereferto λ as the monotonicity constant.

Assumption 2 (Stability).1. There exists a set8 � ∏K

k�1[0, µ̄k] and a monotonicityconstant λ> 0, independent of K, such that the expectedexpenditure function G(µ) is λ-strongly monotone over 8.

2. The target expenditure rate satisfies ρk ≥ v̄k/µ̄k forevery bidder k.

The first part of Assumption 2 requires the expen-diture function to be strongly monotone over a set offeasible multipliers. In the case of a single agent, thestrong monotonicity of the expenditure function isequivalent to the strong convexity of the dual objectivefunction (see Theorem 3). Therefore, the monotonicitycondition is a natural extension of the strong convexitycondition to a simultaneous learning setting. Themonotonicity condition is similar to other commonconditions that guarantee uniqueness and stability in

Figure 1. (Color online) Performance of an Adaptive Pacing Strategy with Parameters µ1 � 0.5 and ε � T−1/2, in Terms of theAttained Fractions of the Performance of the Best Sequence of Bids in Hindsight

Notes. (Left) Comparedwith the fractions that are guaranteed under arbitrary competition, as a function of ρ/v̄. (Right) Comparedwith truthfulbidding, as a function of B/B0.

Balseiro and Gur: Learning in Repeated Auctions with Budgets3962 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS

Page 12: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

concave (unconstrained) games; see, for example, Rosen(1965), as well as the discussion on convergence ofstochastic approximation methods in multidimensionalaction spaces in section 1.10, part 2 of Benveniste et al.(1990). To provide some intuition on this condition,consider a scenario in which the expected expenditureper auction of all advertisers is above their target ex-penditure rate (that is, Gk(µ)>ρk for all k). Because theexpenditure function Gk is nonincreasing in µk for all k,each advertiser can get closer to its target expenditurerate by unilaterally increasing the multiplier by a smallamount. This condition guarantees that if all advertiserssimultaneously increase their multipliers by a smallamount, then all advertisers get closer to their targetexpenditure rates.7 The second part of the assumptionis that budgets are not too small and guarantees thatµ* ∈ 8. This part suggests a simple rule of thumb forsetting a sufficiently large upper bound µ̄k. In OnlineSupplemental Material Part C (see Proposition C.3), weshow that Assumption 2 guarantees the existence ofa unique vector of multipliers µ*. As we further discussin Section 6, the uniqueness of µ* has practical benefitsthat may motivate a market designer to place restric-tions on the target expenditure rates of advertisers toguarantee that Assumption 2 holds.

4.2. Convergence of Dual MultipliersTo analyze the sample path of multipliers resultingfrom simultaneous adoption of adaptive pacing strat-egies, we consider some conditions on the step sizesselected by the different advertisers.

Assumption 3 (Joint Selection of Step Sizes). Let ε̄ �maxk∈ 1,...,K{ } εk, ε � mink∈ 1,...,K{ } εk, v̄ � maxk{v̄k}, and letλ be the monotonicity constant that appears in Assumption 2.The profile of step sizes (εk)Kk�1 satisfies the following:

i. ε̄ ≤ 1/v̄ and ε̄< 1/(2λ),ii. lim

T→∞ ε̄2/ε � 0,

iii. limT→∞Tε2/ε̄ � ∞.

Assumption 3 details conditions on the joint selectionsof step sizes (extending the individual conditions thatappear in Assumption 1), essentially requiring the stepsizes selected by the different advertisers to be “rea-sonably close” to one another. Nevertheless, the class ofstep sizes defined by Assumption 3 is quite generaland allows for flexibility in the individual step size se-lection. For example, step sizes εk � ckT−γk satisfyAssumption3 foranyconstants ck > 0and0<γk < 1suchthat maxk γk ≤ 2mink γk and 2maxk γk ≤ 1 +mink γk.Moreover, when step sizes are equal across advertisersand satisfy ε< 1/(2λ), Assumption 3 coincides withAssumption 1.

The following result states that, when Assumption 3holds, the multipliers selected by advertisers undersimultaneous adoption of adaptive pacing strategiesconverge to the vector µ*. In what follows we usethe standard norm notation ‖x‖p :�

(∑Kk�1 |xk |p

)1/pfor a

vector x ∈ RK.

Theorem 4 (Convergence of Dual Multipliers). Supposethat Assumption 2 holds, and let µ* be the profile multipliersdefined by (5). Assume that all advertisers follow adaptive pacingstrategies with step sizes that together satisfy Assumption 3.Then,

limT→∞,Bk�ρkT

1T

∑Tt�1

E ‖µt − µ*‖22[ ] � 0 .

Theorem 4 adapts standard stochastic approximationtechniques (see, e.g., Nemirovski et al. 2009) to ac-commodate the expenditure feedback form and thepossibility of different step sizes across agents. Inthe proof of the theorem we establish that underAssumption 2 there exist positive constants C1 and C2,independent of T, such that

E[‖µt−µ*‖221

{B̃k,t+1 ≥ v̄k ∀k

}]≤C1

ε̄

ε1−2λε( )t−1+C2

ε̄2

ε,

for any t ∈ 1, . . . ,T{ }, where the dependence of C1 andC2 on the number of advertisers is of a linear order. Thisimplies that insofar as the respective primal solutionsare feasible given the remaining budgets, the vector ofmultipliers µt converges in L2 to the vector µ*. We thenargue that the strategies do not deplete the budget tooearly and establish the result using the conditions inAssumption 3. We note that the fastest rate of con-vergence within this class of strategies is of order T−1/2and is achieved under the symmetric step-size selectionof the form ε ∼ T−1/2.Owing to primal feasibility constraints, the conver-

gence of multipliers does not guarantee convergencein performance. However, the former has stand-alonepractical importance in allowing themarket designer topredict the bidding behavior of advertisers. This im-plication is discussed in Section 6.

4.3. Convergence in PerformanceWe next study the long-run average performanceachieved when all advertisers follow adaptive pacingstrategies. Recall that Theorem 4 implies that the vectorof multipliers µt selected by advertisers under thesestrategies converges to the vector µ*. However, becausein every period the remaining budget may potentiallyconstrain bids, the convergence of multipliers to µ*does not guarantee performance that is close to the

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3963

Page 13: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

performance achieved when using the fixed vector µ*throughout the campaign. Nevertheless, establishingconvergence in performance is a key step toward equi-librium analysis, and the performance that is obtainedusing the fixed vector µ* is a natural candidate for sucha convergence. Given a vector of multipliers µ, wedenote

Ψk(µ) :�T Ev vk − (1 + µk)dk( )+[ ] + µkρk

( ), (6)

where dk is defined as in Section 4.1. Recalling (4),Ψk(µ*) is the dual performance of advertiser k when ineach period t each advertiser i bids bi,t � vi,t/(1 + µ∗

i ).Balseiro et al. (2015) show that Ψk(µ*) coincides withthe expected performance achieved when all adver-tisers shade bids according to µ* and are allowed to bideven after budgets are depleted.

Theorem 5 (Convergence in Performance). Suppose thatAssumption 2 holds, and let µ* be the profile of multipliersdefined by (5). LetA � (Ak)Kk�1 be a profile of adaptive pacingstrategies, with step sizes that together satisfy Assumption 3.Then, for each k ∈ 1, . . . ,K{ },

lim supT→∞,Bk�ρkT

1T(Ψk(µ*) −ΠA

k) ≤ 0.

In the proofwe establish thatwhen all advertisers followadaptive pacing strategies, deviations from target ex-penditure rates are controlled in a manner such thatadvertisers very often deplete their budgets “close” tothe end of the campaign. Therefore the potential lossalong the sample path due to budget depletion is rel-atively small compared with the cumulative payoff. Weestablish that there exist some positive constants C1through C4, independent of T, such that

Ψk(µ*) −ΠAk ≤ C1 + C2

ε̄1/2

ε3/2+ C3

ε̄Tε1/2

+ C41ε. (7)

This allows for the analysis of various step size selectionsand establishes convergence in performance wheneverthe sequence of step sizes satisfies Assumption 3. Inparticular, (7) implies that a symmetric step-size selectionof order ε ∼ T−1/2 guarantees a long-run average con-vergence rate of order T−1/4. We conjecture that suchparametric selection further guarantees a convergencerate of T−1/2 and demonstrate this numerically in OnlineSupplemental Material Part D.

Illustrative Numerical Analysis of Convergence UnderSimultaneous Learning. In Online Supplemental Ma-terial Part D, we demonstrate the convergence estab-lished in Theorems 4 and 5 through a sequence ofnumerical experiments that simulate the sample pathof multipliers and performance of bidders that simul-taneously follow adaptive pacing strategies. Our results

demonstrate the asymptotic convergence of dual mul-tipliers and performance in a variety of instances andunder different selections of step sizes that satisfyAssumption 3. A step size selection of ε ∼ T−1/2 led tosuperior convergence rates of T−1/2, for bothmultipliersand performance.

5. Approximate Nash Equilibrium inDynamic Strategies

In this sectionwe study conditions underwhich adaptivepacing strategies constitute an ε-Nash equilibrium. First,we note that although a profile of adaptive pacingstrategies converges in multipliers and in performance,such a strategy profile does not necessarily constitutean approximate Nash equilibrium. We then detail oneextension of our formulation (motivated in the contextof ad auctions) whereby in each period each advertiserparticipates in one of multiple auctions that take placesimultaneously. We show that in such a regime adaptivepacing strategies constitute an ε-Nash equilibrium indynamic strategies.An advertiser can potentially gain by deviating from

a profile of adaptive pacing strategies if this deviation:(i) reduces competition in later auctions by causingcompetitors to deplete their budgets earlier; or (ii) re-duces competition before the budgets of the competitorsare depleted by inducing competitors to learn “wrong”multipliers. Under Assumption 3, the self-correctingnature of adaptive pacing strategies prevents competi-tors from depleting budgets too early, and thus thefirst avenue is not effective. However, when the same setof advertisers frequently interact, an advertiser cancause competitors to excessively shade bids by drivingthem to learn higher multipliers and, therefore, canreduce competition and increase profits throughout thecampaign.In practice, however, the number of advertisers bid-

ding in an online admarket is typically large and, owingto the high frequency of auctions as well as sophisticatedtargeting technologies, advertisers often participateonly in a fraction of all auctions that take place (Celiset al. 2014). As a result, each advertiser often competeswith different bidders in every auction (Balseiro et al.2014). This limits the impact a single advertiser can haveon the expenditure trajectory of other advertisers and,therefore, the benefit from unilaterally deviating tostrategies such as the ones described above can beexpected to be small. To demonstrate this intuition,we next detail one extension of our basic framework,whereby in each period multiple auctions take placesimultaneously and each advertiser is “matched” withone of these auctions according to its targeting cri-teria and the attributes associated with the auctionedimpressions. We establish that in markets with such

Balseiro and Gur: Learning in Repeated Auctions with Budgets3964 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS

Page 14: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

characteristics adaptive pacing strategies constitutean ε-Nash equilibrium.

5.1. Parallel Auctions ModelIn each time period t � 1, . . . ,T we now assume thatthere are M ad slots sold simultaneously in parallelauctions. This can correspond to multiple publisherssimultaneously auctioning inventory or a publisherwith multiple ad slots running one auction per slot.We assume that each advertiser participates in one ofthe M auctions independently at random. We denoteby mk,t ∈ 1, . . . ,M{ } the auction advertiser k partici-pates in at time t. In each period t the random variablemk,t is independently drawn from an exogenous distri-bution αk � {αk,m}Mm�1, where αk,m denotes the proba-bility that advertiser k participates in auction m. Thetype of advertiser k is therefore given by θk :� Fk, ρk,αk

( ).

The definition of dk,t, the highest competing bid facedby advertiser k, is adjusted to be

dk,t :� max maxi : i��k

{1{mi,t � mk,t

}bi,t

}, 0

{ }.

Notably, all the results we established thus far holdwhen advertisers compete in M parallel auctions asdiscussed above. We next establish that under this ex-tension adaptive pacing strategies constitute an ε-Nashequilibrium in dynamic strategies, in the sense that noadvertiser can benefit from unilaterally deviating toany other strategy (including ones with access to com-plete information) when the number of time periods andbidders is large. We note that by applying a similaranalysis, these strategies could also be shown to constitutean ε-Nash equilibrium in dynamic strategies underother frameworks that have been considered in theliterature, such as those in Iyer et al. (2014) and Balseiroet al. (2014), in which advertisers’ campaigns do notnecessarily start and end at the same time. Additionally,the results of this section generalize tomore sophisticatedmatching models, such as ones that allow the adver-tisers to participate in multiple slots per period (whenthe number of slots each advertiser participates in perperiod is small relative to M) and the distribution ofvalues to be auction dependent.

We denote by @CIk ⊇ @k the space of nonanticipating

and budget-feasible strategies with complete infor-mation. In particular, strategies in@CI

k may have accessto all types (θi)Ki�1, as well as past value realizations,bids, and expenditures of all competitors. To analyzeunilateral deviations from a profile of adaptive pacingstrategies we consider conditions on the likelihood thatdifferent advertisers compete in the same auction in agiven timeperiod. Let ak,i � P{mk,t � mi,t} � ∑M

m�1 αk,mαi,m

denote the probability that advertisers k and i competein the same auction in a given time period, and let

ak � (ak,i)i��k ∈ [0, 1]K−1. We refer to these as thematching probabilities of advertiser k.

Assumption 4 (Limited Interaction). The matching prob-abilities (ak)Kk�1 and profile of step sizes (εk)Kk�1 satisfy

i. limM,K→∞K1/2 max

k�1,...,K‖ak‖2 <∞,

ii. limM,K,T→∞ ε̄/

¯ε maxk�1,...,K

‖ak‖22 � 0.

The first condition on the matching probabilities inAssumption 4 is that each advertiser interacts witha limited number of different competitors per timeperiod, even when the total number of advertisers andauctions per period is large. The second condition tiesthe matching probabilities with the profile of step sizesselected by advertisers, essentially implying that ad-vertisers can interact with more competitors per timeperiod when the “learning rates” of the strategies aresimilar to one another. The class of matching proba-bilities and step sizes defined by Assumption 4 in-cludesmany practical settings. For example, when eachadvertiser participates in each auction with the sameprobability, one has ak‖ ‖2 ≈ K1/2/M because αi,m � 1/Mfor each advertiser i and auction m. When the expectednumber of bidders per auction κ :�K/M is fixed (im-plying that the number of parallel auctions is pro-portional to the number of players), we have ‖ak‖2 ≈κ/K1/2, and the first condition is satisfied. The secondcondition states that in such a case the differencebetween the learning rates of different εadvertisersshould be of order o(K/κ2).Theorem 6 (ε-Nash Equilibrium in Dynamic Strategies).Suppose that Assumptions 2 and 4 hold. Let A be a profile ofadaptive pacing strategies with step sizes that togethersatisfy Assumption 3. Then,

lim supT,K,M→∞ ,Bk�ρkT

supk∈ 1,...,K{ },β∈@CI

k

1T

Πβ,A−kk

(−ΠA

k

)≤ 0.

Theorem 6 establishes that adaptive pacing strategies(with step sizes and matching probabilities that to-gether satisfy Assumptions 3 and 4) constitute an ε-Nashequilibrium within the class of dynamic strategies withaccess to complete information on market primitives,and perfect information on all the events that took placein prior periods. In particular, the benefit from uni-laterally deviating from a profile of adaptive pacingstrategies diminishes to zero in large markets, even whenadvertisers know their value distribution up front (aswell as the value distributions and budgets of theircompetitors), and even when they can acquire real-timeinformation on the past value realizations, bids, andpayoffs of their competitors.

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3965

Page 15: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

The key idea of the proof lies in bounding the benefitin terms of Ψk(µ*) from unilaterally deviating to anarbitrary strategy. Under Assumption 4 the impact ofadvertisers on one another is limited because eachadvertiser interacts with a limited number of differentcompetitors in each time period. Thus, competitorslearn the stable multipliers µ* regardless of the actionsof a deviating advertiser, and an advertiser cannotreduce competition by inducing competitors to learna “wrong” multiplier. We show that the benefit fromdeviating to any other strategy is small when multi-pliers are “close” to µ*. We bound the performance ofan arbitrary budget-feasible strategy using a Lagrang-ian relaxation in which we add the budget constraint tothe objective with µk

∗ as the Lagrange multiplier. Thisyields a bound on the performance of any dynamicstrategy, including strategies with access to completeand perfect information. Thus, we obtain that thereexist constants C1 through C5, independent of T, K, andM, such that for any profile of advertisers satisfyingAssumption 2 and any strategy β ∈ @CI

k :

Πβ,A−kk −Ψk(µ*) ≤ C1+C2 ak‖ ‖2K1/2 ε̄

1/2

ε3/2

+C3 ak‖ ‖2K1/2 ε̄Tε1/2

+ C41ε+ C5 ak‖ ‖22

ε̄Tε

for each advertiser k. This allows one to evaluate uni-lateral deviations under different step sizes and to es-tablish the result under the conditions in Assumptions 3and 4.

Theorem 6 shows that a profile of adaptive pacingstrategies constitutes an approximate equilibrium whenthere is a large number of players who do not interacttoo frequently. In somemarkets, however, a small groupof advertisers may interact with each other in manyauctions, thereby violating the assumptions of the theo-rem. Using a fluid model it is possible to numericallyanalyze the benefit from unilaterally deviating froma profile of adaptive pacing strategies in markets witha few advertisers. In numerical experimentswe observedthat the suboptimality of adaptive pacing strategiesdiminishes fast as the number of players increases. Inparticular, the benefit from deviating is typically smalleven in markets with few advertisers who interactfrequently.

6. Concluding Remarks6.1. Summary and ImplicationsIn this paper we introduced a family of adaptive pacingstrategies, in which advertisers adjust the pace at whichthey spend their budget according to their expenditures.We established that such strategies are asymptoticallyoptimalwhen the competitive environment is stationary,but alsowhen competitors’ bids are arbitrary.We further

demonstrated that in large markets, these strategiesconstitute an approximate Nash equilibrium in dynamicstrategies. Notably, adaptive pacing strategies maintainthese notions of optimality without requiring any priorknowledge of whether the nature of competition isstationary, arbitrary, or simultaneously learning. Thisimplies that given characteristics that are common inonline ad markets, an advertiser can essentially followan approximate equilibrium bidding strategy whileensuring the best performance that can be guaranteedoff equilibrium.Although we focus is on the advertisers’ perspective,

our results have implications on market design as well.Given access to the advertisers’ types, our convergenceresults provide practical tools for predicting the bid-ding behavior of advertisers (through the vector µ*)and their payoff (through the functionΨ). Because thesepredictions rely on the uniqueness of µ*, this motivatesthe market designer to place restrictions on target ex-penditure rates that satisfy Assumption 2. In addition,our equilibrium analysis indicates that platforms (suchas Facebook, Google, and Twitter) that provide budgetpacing services based on historical information on alladvertisers may also offer, under conditions that arecommon on online ad markets, “private” budget-pacingservices that do not use information on competing ad-vertisers, with practically no loss of optimality.

6.2. Future Directions and Open ProblemsWe next list several avenues for future research anddiscuss limitations and potential extensions. First, ourresults rely on distribution of values being absolutelycontinuous and do not easily extend to the case ofdiscrete values. In this case, multiple advertisers mightsubmit the same bid, and how ties are broken is impor-tant. Conitzer et al. (2018) argue that, on top of shadingbids, at equilibrium advertisers use mixed strategies inwhich they randomize over bids. We leave the analysisof discrete values as a future research direction.Our results establish the asymptotic optimality of

adaptive pacing strategies in both the “pessimistic”and “optimistic” scenarios when advertisers have noprior knowledge of whether the competitive nature isstationary or arbitrary, and that selecting εk ∼ T−1/2guarantees a convergence rate of T−1/2 in both frame-works. We conjecture that in either of these frame-works, this rate cannot be improved by any admissibleand budget-feasible strategy (random or not), evenwhen advertisers know up front the environment inwhich they are competing (i.e., stationary or arbitrary).Kleinberg (2005) provides some evidence that this rateis tight in the stationary setting, showing that no algo-rithm can achieve a rate better than B−1/2 in a variationof the secretary problem in which the decision makeris allowed to choose the best B secretaries. Althoughthis result assumes that values are drawn without

Balseiro and Gur: Learning in Repeated Auctions with Budgets3966 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS

Page 16: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

replacement from an unknown set of numbers (i.e., arandom permutation model), we believe that similarresults apply to a setting in which values are drawn in-dependently from an unknown distribution.

The update rule of the adaptive pacing strategy takesa simple additive form, but one may consider otherupdate rules. One approach is to use nonlinear updaterules such as exponentiated gradient descent, in whichmultipliers are updated multiplicatively. One may alsouse rules that depend on the entire history of observationsrather than only on the most recent period, for example,an update rule in which the gradient step at time t de-pends on the difference between the target expenditurerate and the average expenditure up to time t, rather thanonly on the expenditure in period t. Another approachis to use methods that approximate the Hessian of thedual objective function based on all observed gradients,such as the online Newton method (Hazan et al. 2007)or the adaptive subgradient method (Duchi et al. 2011).

AcknowledgmentsThe authors thank the associate editor and three reviewers fortheir feedback that helped to improve the paper and OmarBesbes, Negin Golrezaei, Vahab S. Mirrokni, Michael Ostrov-sky, and Gabriel Y. Weintraub for valuable comments.

Endnotes1A descriptive discussion of the budget pacing service Facebookoffers advertisers appears at https://developers.facebook.com/docs/marketing-api/pacing. See also recommendations for advertisers inonline blogs powered by Twitter Ads (https://dev.twitter.com/ads/campaigns/budget-pacing), Google’s ad exchange platform Double-Click (https://support.google.com/ds/answer/2682462), as well asExactDrive (http://exactdrive.com/news/how-budget-pacing-helps-your-online-advertising-campaign). All websites accessed October2017.2Our results hold for random or lexicographic tie-breaking rules. Tosimplify exposition we assume that ties are broken in favor of thedecision maker.3Although the excess budget remaining after period T is not in-cluded in this payoff, it can be accounted for without impacting ourresults.4Although in the definition of πβ

k (vk;dk), the vectors vk in 0, v̄k[ ]T anddk in RT+ are fixed in advance, we also allow the components of thesevectors to be selected dynamically according to the realized path of β;that is, vk,t and dk,t could be measurable with respect to the filtrationσ vk,τ, bk,τ, zk,τ, uk,τ

{ }t−1τ�1

( ). This allows nonoblivious or adaptive ad-

versaries and, in particular, valuations and competing bids that areaffected by the player’s strategy. For further details see chapter 7 ofCesa-Bianchi and Lugosi (2006).5The heuristic of bidding bk,t � vk,t/(1 + µH) until the budget is de-pleted, with µH optimal for Figure 4, can be shown to be asymp-totically optimal for the hindsight problem (1) as T grows large (see,e.g., Talluri and van Ryzin 1998).6We restrict the formal definition of the adaptive pacing strategy tostationary step sizes only to simplify and shorten analysis, and thestrategy can be adjusted to allow for time-varying step sizes througha more general form of Assumption 1. Nevertheless, we note that theasymptotic optimality notionswe consider are defined for the general

class of admissible budget-feasible strategies and achievedwithin thesubclass of strategies with stationary step sizes.7 In Online Supplemental Material Part C, we show that this part ofthe assumption is implied by the diagonal strict concavity conditiondefined in Rosen (1965), prove that it holds in symmetric settings,and demonstrate its validity in simple cases.

ReferencesAraman VF, Caldentey R (2011) Revenue management with incom-

plete demand information. Cochran JJ, Cox LA Jr, Keskinocak P,Kharoufeh JP, Smith JC, eds. Wiley Encyclopedia of Operations Re-search and Management Science (John Wiley & Sons, Hoboken, NJ).

Ashlagi I, Monderer D, Tennenholtz M (2012) Robust learningequilibrium. Working paper, Technion–Israel Institute of Tech-nology, Haifa, Israel.

Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knap-sacks. Proc. 2013 IEEE 54th Annual Sympos. Foundations Comput.Sci. FOCS ’13 (IEEEComputer Society,Washington,DC), 207–216.

Badanidiyuru A, Langford J, Slivkins A (2014) Resourceful contextualbandits. Balcan MF, Feldman V, Szepesvri C, eds. Proc. MachineLearn. Res. 35:1109–1134.

Balseiro S, Kim A, Mahdian M, Mirrokni V (2017) Budget manage-ment strategies in repeated auctions. Proc. 26th Internat. Conf.World Wide Web (ACM, New York), 15–23.

Balseiro SR, Besbes O, Weintraub GY (2015) Repeated auctions withbudgets in ad exchanges: Approximations and design. Man-agement Sci. 61(4):864–884.

Balseiro SR, Feldman J, Mirrokni V, Muthukrishnan S (2014) Yieldoptimization of display advertising with ad exchange. Man-agement Sci. 60(12):2886–2907.

Beck A, Teboulle M (2003) Mirror descent and nonlinear projectedsubgradient methods for convex optimization. Oper. Res. Lett.31(3):167–175.

Benveniste A, Priouret P, Metivier M (1990) Adaptive Algorithms andStochastic Approximations (Springer, New York).

Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimi-zation. Oper. Res. 63(5):1227–1244.

Borgs C, Chayes J, Immorlica N, Jain K, Etesami O,MahdianM (2007)Dynamics of bid optimization in online advertisement auctions.Proc. 16th Internat. Conf. World Wide Web. WWW ’07 (ACM,New York), 531–540.

Borodin A, El-Yaniv R (1998) Online Computation and CompetitiveAnalysis (Cambridge University Press, Cambridge, UK).

Brafman RI, Tennenholtz M (2004) Efficient learning equilibrium.Artificial Intelligence 159(1–2):27–47.

Celis LE, Lewis G, Mobius M, Nazerzadeh H (2014) Buy-it-now ortake-a-chance: Price discrimination through randomized auc-tions. Management Sci. 60(12):2927–2948.

Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games(Cambridge University Press, Cambridge, UK).

Charles DX, Chakrabarty D, Chickering M, Devanur NR, Wang L(2013) Budget smoothing for internet ad auctions: A gametheoretic approach. ACM Conf. Electronic Commerce EC ’13(ACM, New York), 163–180.

Conitzer V, Kroer C, Sodomka E, Stier-Moses N (2018) Multiplicativepacing equilibria in auction markets. Proc. 14th Conf. WorkshopInternet Network Econom. (WINE '18) (Springer-Verlag, Cham,Switzerland).

Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods foronline learning and stochastic optimization. J. Machine Learn. Res.12(July):2121–2159.

eMarketer (2016a) More than two-thirds of us digital display adspending is programmatic. Accessed July 10, 2017, https://www

Balseiro and Gur: Learning in Repeated Auctions with BudgetsManagement Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS 3967

Page 17: Learning in Repeated Auctions with Budgets: Regret ... · tisers sequentially submit bids to place their ads at slots as these become available. Important examples include ad exchange

.emarketer.com/Article/More-Than-Two-Thirds-of-US-Digital-Di splay-Ad-Spending-Programmatic/1013789.

eMarketer (2016b) Us digital ad spending to surpass tv this year.Accessed July 10, 2017, https://www.emarketer.com/Article/US-Digital-Ad-Spending-Surpass-TV-thi s-Year/1014469.

Fudenberg D, Levine D (1998) The Theory of Learning in Games (MITPress, Cambridge, MA).

Han Z, Zheng R, Poor HV (2011) Repeated auctions with Bayesiannonparametric learning for spectrum access in cognitive radionetworks. IEEE Trans. Wireless Comm. 10(3):890–900.

Hannan J (1957) Approximation to Bayes risk in repeated plays.Dresher M, Tucker AW,Wolfe P, eds. Contributions to the Theory ofGames, vol. 3 (Princeton University Press, Princeton, NJ), 97–140.

Hart S, Mas-Colell A (2000) A simple adaptive procedure leading tocorrelated equilibria. Econometrica 68(5):1127–1150.

Hart S, Mas-Colell A (2013) Simple Adaptive Strategies, vol. 4 (WorldScientific Publishing Company, Singapore).

Hazan E, Agarwal A, Kale S (2007) Logarithmic regret algorithms foronline convex optimization. Machine Learn. 69(2/3):169–192.

Hon-Snir S, Monderer D, Sela A (1998) A learning approach toauctions. J. Econom. Theory 82(1):65–88.

Iyer K, Johari R, Sundararajan M (2014) Mean field equilibriaof dynamic auctions with learning. Management Sci. 60(12):2949–2970.

Jiang C, Beck CL, Srikant R (2014) Bidding with limited statisticalknowledge in online auctions. SIGMETRICS Performance Eval-uation Rev. 41(4):38–41.

Karande C, Mehta A, Srikant R (2013) Optimizing budget con-strained spend in search advertising. Proc. 6th ACM Internat.Conf. Web Search Data Mining (ACM, New York), 697–706.

Kleinberg R (2005) A multiple-choice secretary algorithm with ap-plications to online auctions. Proc. 16th Annual ACM-SIAMSymp. Discrete Algorithms SODA ’05 (Society for Industrial andApplied Mathematics, Philadelphia), 630–631.

Kushner HJ, Yin GG (2003) Stochastic Approximation and RecursiveAlgorithms and Applications (Springer, New York).

Lai T (2003) Stochastic approximation. Ann. Statist. 31(2):391–406.Martello S, Toth P (1990) Knapsack Problems: Algorithms and Computer

Implementations (John Wiley & Sons, New York).Mirrokni VS, Gharan SO, Zadimoghaddam M (2012) Simultaneous

approximations for adversarial and stochastic online budgeted

allocation.Proc. 23rdAnnual ACM-SIAMSympos. Discrete AlgorithmsSODA ’12 (Society for Industrial and Applied Mathematics,Philadelphia), 1690–1701.

Nedic A, Ozdaglar A (2009) Distributed subgradient methods formulti-agent optimization. IEEE Trans. Automatic Control 54(1):48–61.

Nemirovski A, Yudin D (1983) Problem Complexity and Method Effi-ciency in Optimization (John Wiley, New York).

Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochasticapproximation approach to stochastic programming. SIAM J. Optim.19(4):1574–1609.

Rosen JB (1965) Existence and uniqueness of equilibrium points forconcave n-person games. Econometrica 33(3):520–534.

Sani A, Neu G, Lazaric A (2014) Exploiting easy data in online op-timization. Ghahramani Z, Welling M, Cortes C, Lawrence ND,Weinberger KQ, eds. Proc. Neural Inform. Processing Systems 2014Conf. (Curran Associates, Inc., Red Hook, NY), 810–818.

Seldin Y, Slivkins A (2014) One practical algorithm for both stochasticand adversarial bandits.Proc.Machine Learn. Res. 32(2): 1287–1295.

Shalev-Shwartz S (2012) Online learning and online convex opti-mization. Foundations Trends Machine Learn. 4(2):107–194.

Talluri K, van Ryzin G (1998) An analysis of bid-price controls fornetwork revenuemanagement.Management Sci. 44(11):1577–1593.

Talluri KT, van Ryzin GJ (2004) The Theory and Practice of RevenueManagement, International Series in Operations Research andManagement Science, vol. 68 (Springer, New York).

Tsybakov AB (2008) Introduction to Nonparametric Estimation (Springer,New York).

Weed J, Perchet V, Rigollet P (2016) Online learning in repeatedauctions. Proc. Machine Learn. Res. 49:1562–1583.

Yao ACC (1977) Probabilistic computations: Toward a unifiedmeasure of complexity. Proc. Foundations Comput. Sci. (FOCS)18th IEEE Sympos. (Institute of Electrical and Electronics Engi-neers, Washington, DC), 222–227.

ZhouY, ChakrabartyD, Lukose R (2008)Budget Constrained Bidding inKeyword Auctions and Online Knapsack Problems (Springer, Berlin),566–576.

Zinkevich M (2003) Online convex programming and generalizedinfinitesimal gradient ascent. Fawcett T, Mishra N, eds. Proc.20th Internat. Conf. Machine Learn. (AAAI Press,Menlo Park, CA),928–936.

Balseiro and Gur: Learning in Repeated Auctions with Budgets3968 Management Science, 2019, vol. 65, no. 9, pp. 3952–3968, © 2019 INFORMS