Chih-Yuan Chiu, David Fridovich-Keil, and Claire J. Tomlin

Encoding Defensive Driving as a Dynamic Nash Game

Chih-Yuan Chiu,∗ David Fridovich-Keil,∗ and Claire J. Tomlin

Abstract— Robots deployed in real-world environmentsshould operate safely in a robust manner. In scenarios wherean “ego” agent navigates in an environment with multipleother “non-ego” agents, two modes of safety are commonlyproposed—adversarial robustness and probabilistic constraintsatisfaction. However, while the former is generally computa-tionally intractable and leads to overconservative solutions, thelatter typically relies on strong distributional assumptions andignores strategic coupling between agents.

To avoid these drawbacks, we present a novel formulationof robustness within the framework of general-sum dynamicgame theory, modeled on defensive driving. More precisely, weprepend an adversarial phase to the ego agent’s cost function.That is, we prepend a time interval during which otheragents are assumed to be temporarily distracted, in order torender the ego agent’s equilibrium trajectory robust againstother agents’ potentially dangerous behavior during this time.We demonstrate the effectiveness of our new formulation inencoding safety via multiple traffic scenarios.

I. INTRODUCTION

Decision-making modules in autonomous systems mustmeet safety and robustness criteria before they are de-ployed in real-world settings with uncertain or unknownenvironments. Scenarios such as autonomous driving, inwhich an “ego” agent must interact with other, possiblynon-cooperative “non-ego” agents, are of particular interest.These scenarios are naturally modeled using dynamic gametheory, which describes the evolution of each agent’s stateaccording to their minimization of a cost function. Eachplayer’s cost function depends on the control strategies ofthat player, as well as the shared state of all the players.

To ensure safe and efficient operation of the ego agent insuch multi-agent settings, existing methods formulate safetyin the following two ways. Adversarial robustness methods,such as Hamilton-Jacobi-Isaacs (HJI) equation-based reach-ability theory, aim to generate trajectories that would ensurethe safety of the ego agent despite worst-case behavior of allother agents [1–3]. Another commonly proposed methodol-ogy involves probabilistic constraint satisfaction [4, 5]. Here,algorithms attempt to bound the probability that the egoagent’s trajectory becomes unsafe. Unfortunately, each ofthese approaches carries significant drawbacks. HJI methods

∗The first two authors contributed equally to this work.C-Y Chiu and C Tomlin are with the Dept. of Electrical Engineering

& Computer Sciences, UC Berkeley. D Fridovich-Keil is with the Dept.of Aeronautics & Astronautics, Stanford University. Correspondence [email protected].

This research is supported by an NSF CAREER award, the Air ForceOffice of Scientific Research (AFOSR), NSF’s CPS FORCES and VeHICaLprojects, the UC-Philippine-California Advanced Research Institute, theONR MURI Embedded Humans, a DARPA Assured Autonomy grant, andthe SRC CONIX Center.

Fig. 1: (Top) To encode robustness into its safety guarantees, the egoagent imagines that all other agents behave adversarially during aninitial time frame before resuming “normal” cooperative behavior.(Bottom) In this example, an ego vehicle assumes that the non-ego, oncoming vehicle behaves adversarially for an initial periodof time, which results in its swerving into the ego’s lane. Afterthe adversarial time interval expires, the non-ego agent is assumedto return to its own lane, and resume cooperative behavior for theremainder of the time horizon.

only apply to zero-sum game settings, and exact solutionmethods suffer from the so-called “curse of dimensionality,”with computational cost increasing exponentially in the di-mension of the state [1]. Meanwhile, probabilistic constraintsatisfaction encodes safety via distributional assumptions, butdoes not allow the ego player to anticipate more specificpatterns of adversarial interactions with non-ego agents.

This paper addresses these issues by presenting a third,novel formulation of robustness in multi-agent games, illus-trated in Fig. 1. Here, the ego agent presupposes that, foran initial fixed window of time Tadv, other agents are tem-porarily “distracted” and thus behave erratically, but revert tonormal operation afterwards for the rest of the time horizonT . The ego agent encodes this behavior in a dynamic gameby modifying other agents’ cost structure, and solving theresulting game in a receding time horizon. Once computed,the ego agent begins to execute its equilibrium strategy,and after a small time has elapsed, the ego agent re-solves

arX

iv:2

011.

0481

5v2

[cs

.RO

] 3

0 M

ar 2

021

the game accounting for updated initial conditions. Morespecifically, we encode this notion of “temporary distraction”in each non-ego agent’s cost function using time-varyingrunning cost terms defined over two stages—an adversarialphase, during which the ego agent presumes that other agentsare deliberately choosing control strategies to endanger theego’s safety, followed by a cooperative phase, during whichthe non-ego agents are expected to resume “normal” cooper-ative behavior. This is modeled on the concept of defensivedriving, wherein a driver on a busy road guards themselvesagainst other drivers who, while momentarily distracted, maytemporarily behave dangerously.

The rest of our paper is structured as follows. Sec. IIpresents related work on the use of HJI reachability theoryand chance-constrained optimal control to encode adversarialand probabilistic robustness, respectively, as well as recentliterature on iterative algorithms for solving dynamic games.Sec. III presents the mathematical formulation for the dy-namic game that models the multi-agent interactions studiedin our work. Sec. IV presents the main methodology in thispaper, and formally introduces the concept of an adversarialtime horizon. Sec. V demonstrates the spectrum of robustnesswhich can be expressed in our formulation, compared with apurely cooperative game-theoretic approach, using multipletraffic scenarios. Sec. VI summarizes our contributions anddiscusses directions for future research.

II. RELATED WORK

This section reviews prior literature on several of theprevailing formulations of safety used in the design ofmulti-agent and uncertain systems—adversarial reachability,multi-agent forward reachability, and probabilistic constraintsatisfaction. We compare and contrast these against the novelmethodology in our work. We conclude this section with abrief summary of modern algorithms for iteratively solvingdynamic games.

A. Adversarial Reachability

Adversarial reachability methods [1, 2, 6, 7] involve theconstruction of a zero-sum differential game between twoagents, the ego agent and an adversary. The Nash equilibriumof this game satisfies a Hamilton-Jacobi-Isaacs (HJI) partialdifferential equation, which can be numerically solved viastate space discretization. In this zero-sum framework, theego agent assumes that the adversarial agent is constantlyattempting to compromise the ego’s safety, and will thuscompute and execute a control strategy that steers the ego’strajectory away from any feasible trajectory of the non-egoagent.

Adversarial reachability appropriately describes many in-tricate dynamic interactions, such as capture-the-flag, reach-avoid, and pursuit-evasion-defense games [7–9]. However,it suffers from several significant limitations. First, thiszero-sum formulation can only model dynamic interactionsbetween two agents, or two groups of colluding agents, withopposing goals. This is inadequate for many motion-planningtasks, such as those involving traffic scenarios, which must

account for the presence of an arbitrary number of agents,with possibly an arbitrary number of goals. Second, theadversarial nature of the zero-sum game leads to the con-struction of extremely conservative ego trajectories, sincethe ego must imagine the worst-case non-ego behaviors thatcan possibly transpire Fig. 1. Our approach, on the otherhand, avoids the first issue by considering a general-sumgame applicable to N -player scenarios. Moreover, we avoidthe second issue by modeling antagonistic non-ego behaviorusing the novel notion of an adversarial-to-cooperative timehorizon, as shown in Fig. 1, rather than as a worst-casebounded disturbance. By modeling non-ego agents as firstadversarial, then cooperative, we avoid overly conservativeego strategies corresponding to purely adversarial non-egotrajectories that are unlikely to materialize.

B. Multi-Agent Forward Reachability

Reachability-based formulations can also be used forsafety-critical path planning in a non-game theoretic man-ner. For instance, forward reachable sets (FRS) of the egoagent can be computed offline by numerically solving theHamilton-Jacobi-Bellman equation, then used to aid onlinemotion planning modules in the generation of obstacle-avoiding trajectories. This is the approach taken by theReachability-based Trajectory Design for Dynamical envi-ronments (RTD-D) and RTD-Interval (RTD-I) algorithmspresented in [10, 11], in which not-at-fault ego trajectoriesare generated by leveraging the offline-computed FRS of theego agent and online obstacle motion predictions from anexternal module. Although the resulting trajectories avoidat-fault collisions, this framework does not allow the egoagent to account for the dynamic reactions of non-ego agentsto its behavior. In contrast, our work explicitly models thedynamic obstacles as non-ego agents, within the frameworkof a dynamic feedback game.

Geometric prediction modules form another framework forusing reachability-based methods in a non-game-theoreticsetting. For instance, [12, 13] ensure constraint satisfac-tion by computing fail-safe ego trajectories which avoidan overapproximation of all dynamically feasible non-egotrajectories. This is posed as an optimal control problem(rather than a dynamic game), with avoidance of the set ofall feasible non-ego trajectories serving as a state constraint.Although these approaches ensure that the ego will notcollide with the non-ego agent, they do so at the costof generating overly conservative maneuvers, particularlyin situations when the non-ego agent may not be purelyadversarial throughout the entire time horizon. By contrast,our formulation generates less conservative trajectories byassuming that non-ego agents display hostile behavior onlyduring a fixed subset of the overall time horizon.

C. Probabilistic Constraint Satisfaction

Probabilistic constraint satisfaction is another commonlyused framework for establishing safety guarantees in motionplanning. These approaches bound the probability that anego agent, operating in an unpredictable environment with

stochastic disturbance, becomes unsafe [5]. In particular,risk-sensitive algorithms guard the ego agent from low-probability events that may result in highly dangerous out-comes. For example, [14] generates risk-sensitive trajecto-ries by optimizing an exponential-quadratic cost term thatamplifies the cost of low-probability, yet highly dangerousoutcomes. Meanwhile, [4] associates individual constraintviolations with different penalties, and optimizes their al-location over the horizon. Similarly, [15] uses a probabilisticframework to capture state uncertainties of an autonomousrobotic agent, and uncertainties in the geometry or dynam-ics of obstacles in its environment. However, the controlstrategies generated by these methods merely account for thenonzero probability of unsafe outcomes occurring any timewithin the entire time horizon. By contrast, our work allowsthe ego to explicitly encode adversarial non-ego behaviorinside a specific subset of the time horizon, when suchbehavior is most expected to occur.

D. Algorithms for Solving Dynamic Games

Several alternative methods exist in the literature fornumerically solving the general-sum differential games con-sidered in this work. For instance, one can derive a setof coupled Hamilton-Jacobi equations whose solutions yieldlocal Nash equilibrium strategies [16, 17], which can then becomputed numerically via state space discretization. How-ever, these algorithms inevitably require computational costand memory that scale exponentially with the state dimension(the “curse of dimensionality”), and are thus unsuitablefor modeling the high-dimensional, multi-player interactionsconsidered in our paper [18].

Another category of numerical methods for dynamicgames involves Iterative Best Response (IBR) algorithms [19,20]. These algorithms iterate through the players, repeatedlysolving the optimal control problem of finding the best-response strategy of each player, assuming all other players’strategies are currently fixed at the values in the previousiterate. Replacing the full dynamic game with a sequence ofoptimal control problems reduces computation time at eachiteration; however, IBR algorithms can still be computation-ally inefficient overall.

Our work uses ILQGames [21], a recently developediterative linear-quadratic algorithm, as our primary gamesolver. ILQGames iteratively solves linear-quadratic gamesand incurs computational complexity cubic in the number ofplayers and linear in the time horizon. Although other gamesolvers, such as the Augmented Lagrangian GAME-theoreticSolver (ALGAMES) [22], exist, all known implementationsof these methods are restricted to an open-loop [23, 24]—rather than feedback—information structure and have equiv-alent computational complexity.

III. PRELIMINARIES

Consider the N -player finite horizon general-sum differ-ential game with deterministic, noise-free nonlinear systemdynamics:

x = f(t, x, u1:N ). (1)

Here, x ∈ Rn is the state of the system, obtained byconcatenating the dynamical quantities of interest of eachplayer, t ∈ R denotes time, ui ∈ Rmi is the controlinput of player i, for each i ∈ {1, · · · , N} := [N ], andu1:N := (u1, · · · , uN ) ∈ Rm, where m :=

∑Ni=1mi. The

dynamics map f : R × Rn × Rm → Rn is assumed to becontinuous in t and continuously differentiable in x and ui,for each i = 1, · · · , N and each t ∈ [0, T ]. Since we wishto ensure the safety of one particular player amidst theirinteractions with all other players, we refer to Player 1 asthe ego agent, and the other players as non-ego agents. Eachplayer’s objective is defined as the integral of a running costgi : [0, T ]× Rn × Rm → R over the time horizon [0, T ]:

Ji(u1:N (·)

)=

∫ T

0

gi(t, x(t), u1:N (t)

)dt, (2)

for each i ∈ {1, · · · , N}. The running costs gi encodeimplicit dependence on the state trajectory x(·) : [0, T ] →Rn and explicit dependence on the control signals ui(·) :[0, T ]→ Rm.

To minimize its cost, each player selects a control strategyto employ over the time horizon [0, T ], as described below.We assume that, at each time t ∈ [0, T ], each player iobserves the state x(t), but no other control input {uj(t) |j 6= i}, and uses this information to design their control, i.e.

u(t) := γi(t, x(t)),

where γi : [0, T ]×Rn → Rmi , defined as Player i’s strategy,is assumed to be measurable. We define the strategy spaceof Player i, denoted Γi, as the collection of all of Playeri’s possible strategies, and denote, with a slight abuse ofnotation, the overall cost Ji of each Player i by:

Ji(γ1; · · · ; γN ) := Ji(γ1(·, x(·)), · · · , γN (·, x(·))

).

In practice, we shall solve for strategies γi that are time-varying, affine functions of x.

We now define the Nash equilibrium of the above game.Definition 1: (Nash equilibrium, [25, Chapter 6]) The

strategy set (γ?1 , · · · , γ?N ) is said to be a Nash equilibriumif no player is unilaterally incentivized to deviate from hisor her strategy. Precisely, the following inequality must holdfor each player i:

J?i := Ji

(γ?1 , . . . , γ

?i−1, γ

?i , γ

?i+1, . . . , γ

?N

)(3)

≤ Ji(γ?1 , . . . , γ

?i−1, γi, γ

?i+1, . . . , γ

?N

),∀γi ∈ Γi.

Computing a global Nash equilibrium is intractable fordynamic games with general dynamics and cost functions.As such, in this work, we concern ourselves with findinga local Nash equilibrium, which is defined similarly to (3),but with the inequalities only constrained to hold within aneighborhood of the strategy set (γ?1 , · · · , γ?N ). Moreover,we impose additional constraints on the dynamical quanti-ties of each player, to model appropriate behavior betweenautonomous agents in traffic scenarios. These constraints willtranslate into a set of state constraints, and will significantlyaffect the set of Nash equilibria of the game. As such, in thiswork, we search for a (similarly defined) generalized localNash equilibrium.

IV. METHODS

Our main contribution is a novel formulation of safety,best understood through the lens of defensive driving. InSec. IV-A, we describe how, in the ego agent’s mind,the concept of defensive driving can be encoded into therunning cost of each non-ego agent, i.e. gi(x, u1:N ), foreach i ∈ {2, · · · , N}. To demonstrate this defensive drivingframework in practice, we simulate realistic traffic scenar-ios; Sec. IV-B details the dynamics, costs, and constraintsimposed on the various agents in these simulations. Finally,in Sec. IV-C, we summarize the ILQGames algorithm as themain feedback game solver used in this work.

A. Encoding Defensive Driving as a Running Cost

In our framework, the ego agent (Player 1) operatesunder the assumption that all other agents are momentarilydistracted. To encode this “imagined” scenario, the egoagent imagines the overall time horizon [0, T ] as dividedinto two sub-intervals, the adversarial interval [0, Tadv] andcooperative interval [Tadv, T ], with 0 < Tadv < T . During theadversarial interval, the ego agent imagines the other agentsto be “momentarily distracted,” and desires to act defensively.This phenomenon is modeled using an adversarial runningcost gadv,i : Rn × RNm → R for each i ∈ {2, · · · , N}.On the other hand, during the cooperative interval, the egoagent supposes that the other agents have reverted to their“normal” or “cooperative” manner, and thus proceeds toselect control signals for the remainder of the time horizon ina less conservative manner. This behavior is captured usinga cooperative running cost gcoop,i : Rn×RNm → R for eachi ∈ {2, · · · , N}. In other words, the running cost of eachnon-ego agent gi can be piecewisely defined as follows:

gi(t, x, u1:N ) =

{gadv,i(x, u1:N ), t ∈ [0, Tadv),

gcoop,i(x, u1:N ), t ∈ [Tadv, T ].

In this scenario, the net integrated cost Ji, first defined in(2) can be written as follows:

Ji =

∫ Tadv

0

gadv,i(x, u1:N )dt+

∫ T

Tadv

gcoop,i(x, u1:N )dt . (4)

With increasing Tadv, the ego agent imagines an increasinglyadversarial encounter and acts more and more defensivelyas a result. In practice, the user or system designer wouldselect a suitable Tadv before operation, e.g., by choosing thelargest Tadv such that the solution deviates from a nominalsolution with Tadv = 0 sufficiently little.

B. Simulation Setup

To test this construction, we simulate two traffic encoun-ters in ILQGames [21] that involve significant interaction(see Sec. V), in which a responsible human driver wouldlikely drive defensibly. Our method attempts to capture thespectrum of this “defensive” behavior as Tadv, the durationof the adversarial time horizon, is varied. In each setting,

each agent (in this case, each car) has augmented bicycledynamics, i.e.:

px,i = vi sin θi, vi = ai,

py,i = vi cos θi, φi = ωi, (5)

θi = (vi/Li) tanφi, ai = ji,

where x = (px,i, py,i, θi, vi, φi, ai)Ni=1 encapsulates the po-

sition, heading, speed, front wheel angle, and accelerationof all vehicles, ui = (ωi, ji) represents each vehicle’s frontwheel rate and tangent jerk, and Li is each player’s inter-axledistance.

We define gadv,i and gcoop,i as weighted combinations ofthe following functions, with different behavior encouragedthrough the use of different weighting coefficients. We denotepi = (px,i, py,i) for each agent position, dì(pi), defined be-low, for the distance between an agent and the correspondinglane centerline ì in the (px,i, py,i)-plane, and dprox for aconstant desired minimum proximity between agents:

lane center:[dì(pi) := min

p`∈ì‖p` − pi‖

]2(6)

ideal speed: (vi − vref,i)2 (7)

cooperative: 1{‖pi − pj‖ < dprox}(dprox − ‖pi − pj‖)2 (8)

adversarial: ‖pi − pj‖2 (9)

input: uTi Riiui . (10)

Recall that, for non-ego agents, the “adversarial” cost isonly present during the adversarial horizon [0, Tadv) andthe “cooperative” cost is present thereafter during the co-operative horizon [Tadv, T ]. We also enforce the followinginequality constraints, where dlane denotes the lane half-width, and vi and vi denote speed limits:

proximity: ‖pi − pj‖ > dprox (11)lane: |dì(pi)| < dlane (12)

speed range: vi < vi < vi , (13)

Here, the “proximity” constraint is enforced for only the egoagent, to force the ego to bear responsibility for satisfyingjoint state constraints which encode his or her own safety(e.g. non-collision). In addition, all agents must satisfy indi-vidual constraints that encode reasonable conduct in traffic(e.g., staying within a range of speeds). All constraints areenforced over the entire time horizon [0, T ]. For all tests, weuse a time horizon T = 15 s and discretize time (following[21] and [25]) at 0.1 s intervals.

C. Implementation Details

The traffic simulations in this work are solved approxi-mately to local feedback Nash equilibria in real time usingILQGames, a recently developed, open-source C++ basedgame solver algorithm introduced in [21]. ILQGames itera-tively solves linear-quadratic games, obtained by linearizingdynamics and quadraticizing costs, and incurs computationalcomplexity that is cubic in the number of players [21]. Asdiscussed above, we must also account for both equality

Fig. 2: Oncoming example. The ego vehicle (right lane, heading upwards) and the oncoming vehicle (left lane, heading downwards)perform increasingly extreme maneuvers as Tadv increases, in this “oncoming” scenario. Dark blue, turquoise, and light green are usedto represent the agents’ location for Tadv = 0, 2.5, 5 s, respectively. Panels show agent positions as time elapses. When Tadv = 0 s, theego vehicle does not deviate significantly from its lane from t = 1 to 6 s because it anticipates that the non-ego vehicle will behavecooperatively throughout the entire time horizon by swerving to avoid a collision. However, when Tadv = 5 s, the ego vehicle activelyswerves outward to avoid the non-ego agent. This minor course adjustment is sufficient to dissuade the oncoming vehicle from making astronger effort to attempt a collision, since it would occur in the future, after Tadv.

and inequality constraints on the game trajectory. While wenote that [21] does not address constrained Nash games,here we incorporate constraints using augmented Lagrangianmethods [26]. For a more detailed discussion of constraint-handling in feedback Nash games, please refer to [27]. Notethat although other game solvers, such as ALGAMES [22]and Iterative Best Response algorithms [20], can likewisehandle constraints, their applications are restricted to open-loop games. A thorough treatment of constraints in gamescan be found in [28].

V. RESULTS

We present simulation results for various traffic scenariosin which a responsible traffic participant would likely drivedefensively. First, we consider a simple situation involvingoncoming vehicles on a straight road, as a proof of concept.Then, we analyze a more complicated intersection examplewith a crosswalk. In both cases, the ILQGames algorithmsolves the defensive driving game quickly, in under 1 s.

A. Oncoming Example

In this example, the ego car is traveling North on a straightroad when it encounters another car traveling South. Sincethe road has a lane in each direction, “ideally” the ego vehiclewould not deviate too far from its lane or speed. However, todrive more defensively, the ego vehicle should plan as thoughthe oncoming Southbound car were to act noncooperatively.Our method encodes precisely this type of defensive plan-ning. Fig. 2 shows the planned trajectories that emerge forincreasing Tadv. As shown, the ego vehicle (bottom) imagines

more aggressive maneuvers for itself and the oncoming car(top) as Tadv increases. Note, however, that these are merelyimagined trajectories and that (a) the ego vehicle can alwayschoose to follow this trajectory only for an initial periodof time, and recompute its trajectory thereafter, and (b) theoncoming vehicle will make its own decisions and will notgenerally follow this “partially adversarial” trajectory. Wesolve each of these problems (with fixed Tadv) in under 0.5 s.

B. Three-Player Intersection Example

We introduce a more complicated scenario designed tomodel the behavior of two vehicles and a pedestrian at anintersection. As shown in Fig. 3, the ego vehicle is presentin the intersection alongside a non-ego vehicle heading inthe opposite direction, who wishes to make a left turn,and a pedestrian, who wishes to cross the road. To reachtheir goal locations, these three agents must cross pathsin the intersection. When Tadv = 0 s, the ego vehiclecontinues straight along its lane because it anticipates thatthe non-ego vehicle will behave cooperatively throughout theentire time horizon. In particular, it anticipates that the non-ego vehicle will continue along its curved path at nominalspeed, resulting in a collision-free trajectory. However, aswith the oncoming example, the ego vehicle’s trajectorybecomes increasingly more conservative as the adversarialtime horizon increases in length. When Tadv = 1 s, theego vehicle actively swerves rightwards to avoid the non-ego vehicle. This is because in this scenario, the oncomingvehicle is initially slower than the ego vehicle, and will

Fig. 3: Three Player Intersection example. The ego vehicle (right lane, heading upwards) navigates an intersection while avoidingcollision with an oncoming vehicle (left lane, heading downwards initially before making a left turn) and a pedestrian (horizontal pathat the intersection, left to right). Dark blue, turquoise, and light green are used to represent the agents’ location at Tadv = 0, 0.5, 1 s,respectively. As with the oncoming example, these three agents perform increasingly extreme maneuvers as Tadv increases. In particular,when Tadv = 0 s, the ego vehicle anticipates that the non-ego will continue along its curved path at its nominal speed, allowing it toapproach the intersection before the ego vehicle. Thus, the ego vehicle swerves leftwards, to avoid the non-ego agent as it makes itsleft turn and continues rightwards in the figure, resulting in a collision-free trajectory. (The pedestrian’s trajectories for Tadv = 0 s andTadv = 0.5 s coincide.) However, when Tadv = 1 s, the ego vehicle accelerates and swerves rightwards to avoid the non-ego vehicle.As before, this suffices to dissuade the oncoming vehicle from performing more collision-seeking behavior, since it forces any potentialcollision to occur after Tadv. The pedestrian’s speed also noticeably decreases for the same reason.

thus approach the intersection at the same time as the egovehicle. As before, each problem is solved in well under0.75 s in single-threaded operation on a standard laptop, viathe ILQGames algorithm [21]. This performance indicatesreal-time capabilities which will be explored in future workon hardware.

VI. DISCUSSION

We have presented a novel formulation of robustnessin motion planning for multi-agent problems. Inspired bydefensive driving, our method explicitly models other agentsas adversarial in only a limited, initial portion of the overallplanning interval. Like adversarial methods in Hamilton-Jacobi-Isaacs (HJI) optimal control, our approach draws uponearlier work in differential game theory. However, instead offorcing the ego to avoid all feasible non-ego trajectories, weuse a piecewise-defined game cost structure to endow theego with the perspective that other agents are temporarilydistracted. As such, our approach generates far less conser-vative behavior than purely adversarial methods. Simulationresults illustrate that this novel formulation of safety can beused to solve these “defensive” problems in real-time. Weare eager to implement this method in hardware and test itsperformance in a receding time horizon with other (human)drivers.

VII. FUTURE WORK

Future work will examine more flexible formulations forencoding defensive behavior in games. For example, in manyreal-life scenarios, vehicle dynamics should be modeledwith stochastic, partial observations, and should account for

occlusions and dynamic disturbances. Also, the ego agentmay wish to select the adversarial time horizon more flexibly.

Of particular interest are cases where the ego agent maychoose to vary Tadv from one non-ego agent to another. Forexample, the ego may observe that some non-ego agentsare behaving more adversarially than others, and respondaccordingly by associating such players with higher valuesof Tadv, compared to other non-ego agents. In addition,the ego may wish to allocate different parts of the overalltime horizon to be adversarial, rather than simply the firstTadv seconds. For example, choosing the adversarial timehorizon to be the final Tadv seconds of the overall timehorizon, rather than the first Tadv seconds, would trans-form the game from an adversarial-to-cooperative type toa cooperative-to-adversarial type. This would be useful insituations where the ego agent predicts that the surroundingnon-ego agents are currently cooperative, but may becomemomentarily distracted in the near future. For example, sucha formulation may cause the ego agent to gradually approachan intersection at which other agents might run a red light.

Future work will also examine the physical implementa-tion of this framework on hardware, and test its performancein a receding horizon fashion in real-life traffic scenarioswith other human drivers. We also anticipate using this novelnotion of safety in game-theoretic applications outside ofmotion planning, such as in economics or public healthapplications, where it may be prudent to model certain agentsas adversarial in certain time intervals.

REFERENCES

[1] Somil Bansal, Mo Chen, Sylvia Herbert, and ClaireJ Tomlin. “Hamilton-Jacobi reachability: A briefoverview and recent advances”. In: 2017 IEEE 56thAnnual Conference on Decision and Control (CDC).IEEE. 2017, pp. 2242–2253.

[2] Ian M Mitchell, Alexandre M Bayen, and Claire JTomlin. “A time-dependent Hamilton-Jacobi formula-tion of reachable sets for continuous dynamic games”.In: IEEE Transactions on automatic control 50.7(2005), pp. 947–957.

[3] Kostas Margellos and John Lygeros. “Hamilton-Jacobiformulation for reach-avoid differential games”. In:Transactions on Automatic Control 56.8 (2011),pp. 1849–1861.

[4] Farbod Farshidian and J. Buchli. “Chance-ConstrainedOptimal Covariance Steering with Iterative Risk Allo-cation”. In: ArXiv abs/1512.07173 (2015).

[5] M. Ono, L. Blackmore, and B. C. Williams. “Chanceconstrained finite horizon optimal control with non-convex constraints”. In: Proceedings of the 2010American Control Conference. 2010, pp. 1145–1152.

[6] L. C. Evans and P. E. Souganidis. “Differential gamesand representation formulas for solutions of Hamilton-Jacobi-Isaacs equations”. In: Indiana University math-ematics journal 33.5 (1984), pp. 773–797.

[7] Jaime F Fisac and S Shankar Sastry. “The pursuit-evasion-defense differential game in dynamic con-strained environments”. In: 54th Conference on Deci-sion and Control (CDC). IEEE. 2015, pp. 4549–4556.

[8] Rufus Isaacs. Games of pursuit. Tech. rep. RandCorporation, 1951.

[9] Rufus Isaacs. Differential games: a mathematical the-ory with applications to warfare and pursuit, controland optimization. Courier Corporation, 1999.

[10] S. Vaskov, H. Larson, S. Kousik, M. Johnson-Roberson, and R. Vasudevan. “Not-at-Fault Drivingin Traffic: A Reachability-Based Approach”. In: 2019IEEE Intelligent Transportation Systems Conference(ITSC). 2019, pp. 2785–2790.

[11] Ming-Yuan Yu, R. Vasudevan, and Matthew Johnson-Roberson. “Risk Assessment and Planning with Bidi-rectional Reachability for Autonomous Driving”. In:2020 IEEE International Conference on Robotics andAutomation (ICRA) (2020), pp. 5363–5369.

[12] Markus Koschi, C. Pek, Mona Beikirch, and M. Al-thoff. “Set-Based Prediction of Pedestrians in UrbanEnvironments Considering Formalized Traffic Rules”.In: 2018 21st International Conference on Intelli-gent Transportation Systems (ITSC) (2018), pp. 2704–2711.

[13] C. Pek and M. Althoff. “Computationally EfficientFail-safe Trajectory Planning for Self-driving VehiclesUsing Convex Optimization”. In: 2018 21st Interna-

tional Conference on Intelligent Transportation Sys-tems (ITSC) (2018), pp. 1447–1454.

[14] Farbod Farshidian and J. Buchli. “Risk Sensitive, Non-linear Optimal Control: Iterative Linear Exponential-Quadratic Optimal Control with Gaussian Noise”. In:ArXiv abs/1512.07173 (2015).

[15] Noel E Du Toit and Joel W Burdick. “Robot motionplanning in dynamic, uncertain environments”. In:IEEE Transactions on Robotics 28.1 (2012), pp. 101–115.

[16] Alan Wilbor Starr and Yu-Chi Ho. “Nonzero-sumdifferential games”. In: Journal of optimization theoryand applications 3.3 (1969), pp. 184–206.

[17] Alan Wilbor Starr and Yu-Chi Ho. “Further propertiesof nonzero-sum differential games”. In: Journal ofOptimization Theory and Applications 3.4 (1969),pp. 207–219.

[18] Dimitri P. Bertsekas. Dynamic Programming and Op-timal Control, Two Volume Set. 2nd. Athena Scientific,2001. ISBN: 1886529086.

[19] Jaime F Fisac, Eli Bronstein, Elis Stefansson, DorsaSadigh, S Shankar Sastry, and Anca D Dragan. “Hi-erarchical Game-Theoretic Planning for AutonomousVehicles”. In: arXiv preprint arXiv:1810.05766(2018).

[20] Zijian Wang, Riccardo Spica, and Mac Schwager.“Game Theoretic Motion Planning for Multi-robotRacing”. In: Distributed Autonomous Robotic Systems.Springer, 2019, pp. 225–238.

[21] David Fridovich-Keil, Ellis Ratner, Anca D Dra-gan, and Claire J Tomlin. “Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-PlayerGeneral-Sum Differential Games”. In: arXiv preprintarXiv:1909.04694 (2019).

[22] Simon Le Cleac’h, Mac Schwager, and ZacharyManchester. “ALGAMES: A Fast Solver forConstrained Dynamic Games”. In: arXiv preprintarXiv:1910.09713 (2019).

[23] Bolei Di and Andrew Lamperski. “Newton’s Methodand Differential Dynamic Programming for Uncon-strained Nonlinear Dynamic Games”. In: 2019 IEEE58th Conference on Decision and Control (CDC).IEEE. 2019, pp. 4073–4078.

[24] Bolei Di and Andrew Lamperski. “Local First-Order Algorithms for Constrained Nonlinear Dy-namic Games”. In: 2020 American Control Confer-ence (ACC). IEEE. 2020, pp. 5358–5363.

[25] Tamer Basar and Geert Jan Olsder. Dynamic nonco-operative game theory. SIAM, 1999.

[26] Jorge Nocedal and Stephen Wright. Numerical opti-mization. Springer Science & Business Media, 2006.

[27] Forrest Laine, David Fridovich-Keil, Chih-Yuan Chiu,and Claire Tomlin. “The Computation of ApproximateGeneralized Feedback Nash Equilibria”. In: arXivpreprint arXiv:2101.02900 (2021).

[28] Jong-Shi Pang and Gesualdo Scutari. “NonconvexGames with Side Constraints”. In: SIAM J. on Op-timization 21.4 (Dec. 2011), pp. 1491–1522. ISSN:

1052-6234. DOI: 10 . 1137 / 100811787. URL:https://doi.org/10.1137/100811787.

https://doi.org/10.1137/100811787

https://doi.org/10.1137/100811787

Chih-Yuan Chiu, David Fridovich-Keil, and Claire J. Tomlin

Documents

Transcript of Chih-Yuan Chiu, David Fridovich-Keil, and Claire J. Tomlin