Cyber-Physical Security and Safety of Autonomous ... - Oulu

17
0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEE Transactions on Communications 1 Cyber-Physical Security and Safety of Autonomous Connected Vehicles: Optimal Control Meets Multi-Armed Bandit Learning Aidin Ferdowsi, Student Member, IEEE, Samad Ali, Student Member, IEEE, Walid Saad, Fellow, IEEE, and Narayan B. Mandayam, Fellow, IEEE Abstract—Autonomous connected vehicles (ACVs) rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication to operate effectively which exposes them to cyber and physical attacks in which an adversary can manipulate sensor readings and physically control the ACVs. In this paper, a comprehensive control and learning framework is proposed to thwart cyber and physical attacks on ACV networks. First, an optimal safe controller for ACVs is derived to maximize the street traffic flow while minimizing the risk of accidents by optimizing ACV speed and inter-ACV spacing. It is proven that the proposed controller is robust to physical attacks which aim at making ACV systems unstable. Next, two data injection attack (DIA) detection approaches are proposed to address cyber attacks on sensors and their physical impact on the ACV system. The proposed approaches rely on leveraging the stochastic behavior of the sensor readings and on the use of a multi-armed bandit (MAB) algorithm. It is shown that, collectively, the proposed DIA detection approaches minimize the vulnerability of ACV sensors against cyber attacks while maximizing the ACV system’s physical robustness. Simulation results show that the proposed optimal safe controller outperforms the current state of the art controllers by maximizing the robustness of ACVs to physical attacks. The results also show that the proposed DIA detection approaches, compared to Kalman filtering, can improve the security of ACV sensors against cyber attacks and ultimately improve the physical robustness of an ACV system. Index Terms—Autonomous connected vehicles, cyber-physical security, data injection attack, optimal safe control, multi-armed bandit learning. I. I NTRODUCTION Intelligent transportation systems (ITS) will encompass au- tonomous connected vehicles (ACVs), roadside smart sensors (RSSs), vehicular communications, and even drones [1]–[4]. To operate autonomously in future ITS, ACVs must process a large volume of data collected via sensors and communication links. Maintaining reliability of this data is crucial for road safety and smooth traffic flow [5]–[8]. However, this reliance on communications and data processing renders ACVs highly susceptible to cyber-physical attacks. In particular, an attacker can possibly interject the ACV data processing stage, inject This research was supported by the U.S. National Science Foundation under Grants OAC-1541105 and CNS-1446621. Aidin Ferdowsi and Walid Saad are with Wireless@VT, Bradley De- partment of Electrical and Computer Engineering, Virginia Tech, Blacks- burg, VA, USA, {aidin, walids}@vt.edu. Samad Ali is with Centre for Wireless Communications (CWC), University of Oulu, Fin- land, [email protected]. Narayan B. Mandayam is with WIN- LAB, Dept. of ECE, Rutgers University, New Brunswick, NJ, USA, [email protected] faulty data, and ultimately induce accidents or compromise the road traffic flow [9]. As demonstrated in a real-world experiment on a Jeep Cherokee in [10], ACVs are largely vulnerable to cyber attacks that can control their critical systems, including braking and acceleration. Naturally, by taking control of ACVs, an adversary can not only impact the compromised ACV, but it can also reduce the flow of other vehicles and cause a non-optimal ITS operation. This, in turn, motivates a holistic study for joint cyber and physical impacts of attacks on ACV systems. A. Related Works Recently, a number of security solutions have been proposed for addressing intra-vehicle network and vehicular commu- nication cyber security problems [11]–[21]. In [11], the au- thors showed that long-range wireless attacks on the current security protocols of ACVs can disrupt their controller area network (CAN). Furthermore, the work in [12], proposed a data analytics approach for the intrusion detection problem by applying a hidden Markov model. In [13], the security vulnerabilities of current vehicular communication architec- tures are identified. The work in [14] proposed the use of multi-source filters to secure a vehicular network against data injection attacks (DIAs). Furthermore, the authors in [15] introduced a new framework to improve the trustworthiness of beacons by combining two physical measurements (angle of arrival and Doppler effect) from received wireless signals. In [16], the authors designed a multi-antenna technique for improving the physical layer security of vehicular millimeter- wave communications. Moreover, in [17], the authors proposed a collaborative control strategy for vehicular platooning to address spoofing and denial of service attacks. The work in [18] developed a deep learning algorithm for authenticating sensor signals. An overview of current research on advanced intra-vehicle networks and the smart components of ITS is presented in [19]. In [20], the authors provided a summary of wireless communications and security issues of the cyber- physical systems. Moreover, in [21], a secure beamforming algorithm is proposed for centralized radio access networks (CRANs) in vehicle-to-infrastructure (V2I) communications. In addition to cyber security in ITSs, physical safety and optimal control of ACVs have been studied in [22]–[28]. In [22], the authors identified the key vulnerabilities of a vehicle’s controller and secured them using intrusion detection

Transcript of Cyber-Physical Security and Safety of Autonomous ... - Oulu

Page 1: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

1

Cyber-Physical Security and Safety of AutonomousConnected Vehicles: Optimal Control Meets

Multi-Armed Bandit LearningAidin Ferdowsi, Student Member, IEEE, Samad Ali, Student Member, IEEE,

Walid Saad, Fellow, IEEE, and Narayan B. Mandayam, Fellow, IEEE

Abstract—Autonomous connected vehicles (ACVs) rely onintra-vehicle sensors such as camera and radar as well asinter-vehicle communication to operate effectively which exposesthem to cyber and physical attacks in which an adversary canmanipulate sensor readings and physically control the ACVs. Inthis paper, a comprehensive control and learning framework isproposed to thwart cyber and physical attacks on ACV networks.First, an optimal safe controller for ACVs is derived to maximizethe street traffic flow while minimizing the risk of accidents byoptimizing ACV speed and inter-ACV spacing. It is proven thatthe proposed controller is robust to physical attacks which aimat making ACV systems unstable. Next, two data injection attack(DIA) detection approaches are proposed to address cyber attackson sensors and their physical impact on the ACV system. Theproposed approaches rely on leveraging the stochastic behaviorof the sensor readings and on the use of a multi-armed bandit(MAB) algorithm. It is shown that, collectively, the proposedDIA detection approaches minimize the vulnerability of ACVsensors against cyber attacks while maximizing the ACV system’sphysical robustness. Simulation results show that the proposedoptimal safe controller outperforms the current state of the artcontrollers by maximizing the robustness of ACVs to physicalattacks. The results also show that the proposed DIA detectionapproaches, compared to Kalman filtering, can improve thesecurity of ACV sensors against cyber attacks and ultimatelyimprove the physical robustness of an ACV system.

Index Terms—Autonomous connected vehicles, cyber-physicalsecurity, data injection attack, optimal safe control, multi-armedbandit learning.

I. INTRODUCTION

Intelligent transportation systems (ITS) will encompass au-tonomous connected vehicles (ACVs), roadside smart sensors(RSSs), vehicular communications, and even drones [1]–[4].To operate autonomously in future ITS, ACVs must process alarge volume of data collected via sensors and communicationlinks. Maintaining reliability of this data is crucial for roadsafety and smooth traffic flow [5]–[8]. However, this relianceon communications and data processing renders ACVs highlysusceptible to cyber-physical attacks. In particular, an attackercan possibly interject the ACV data processing stage, inject

This research was supported by the U.S. National Science Foundation underGrants OAC-1541105 and CNS-1446621.

Aidin Ferdowsi and Walid Saad are with Wireless@VT, Bradley De-partment of Electrical and Computer Engineering, Virginia Tech, Blacks-burg, VA, USA, {aidin, walids}@vt.edu. Samad Ali is withCentre for Wireless Communications (CWC), University of Oulu, Fin-land, [email protected]. Narayan B. Mandayam is with WIN-LAB, Dept. of ECE, Rutgers University, New Brunswick, NJ, USA,[email protected]

faulty data, and ultimately induce accidents or compromisethe road traffic flow [9]. As demonstrated in a real-worldexperiment on a Jeep Cherokee in [10], ACVs are largelyvulnerable to cyber attacks that can control their criticalsystems, including braking and acceleration. Naturally, bytaking control of ACVs, an adversary can not only impact thecompromised ACV, but it can also reduce the flow of othervehicles and cause a non-optimal ITS operation. This, in turn,motivates a holistic study for joint cyber and physical impactsof attacks on ACV systems.

A. Related Works

Recently, a number of security solutions have been proposedfor addressing intra-vehicle network and vehicular commu-nication cyber security problems [11]–[21]. In [11], the au-thors showed that long-range wireless attacks on the currentsecurity protocols of ACVs can disrupt their controller areanetwork (CAN). Furthermore, the work in [12], proposed adata analytics approach for the intrusion detection problemby applying a hidden Markov model. In [13], the securityvulnerabilities of current vehicular communication architec-tures are identified. The work in [14] proposed the use ofmulti-source filters to secure a vehicular network against datainjection attacks (DIAs). Furthermore, the authors in [15]introduced a new framework to improve the trustworthinessof beacons by combining two physical measurements (angleof arrival and Doppler effect) from received wireless signals.In [16], the authors designed a multi-antenna technique forimproving the physical layer security of vehicular millimeter-wave communications. Moreover, in [17], the authors proposeda collaborative control strategy for vehicular platooning toaddress spoofing and denial of service attacks. The work in[18] developed a deep learning algorithm for authenticatingsensor signals. An overview of current research on advancedintra-vehicle networks and the smart components of ITS ispresented in [19]. In [20], the authors provided a summaryof wireless communications and security issues of the cyber-physical systems. Moreover, in [21], a secure beamformingalgorithm is proposed for centralized radio access networks(CRANs) in vehicle-to-infrastructure (V2I) communications.

In addition to cyber security in ITSs, physical safety andoptimal control of ACVs have been studied in [22]–[28].In [22], the authors identified the key vulnerabilities of avehicle’s controller and secured them using intrusion detection

Page 2: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

2

algorithms. The work in [23] analyzed the ACVs as cyber-physical systems and developed an optimal controller for theirmotion. The authors in [24] studied the safe operation of ACVnetworks in presence of an adversary that tries to estimate thedynamics of ACVs by its own observations. The authors in[25] proposed centralized and decentralized safe cruise controlapproaches for ACV platoons. A learning-based approach isproposed in [26] to control the velocity of ACVs. Furthermore,in [27], the authors have proposed a robust deep reinforcementlearning (RL) algorithm which mitigates cyber attacks on ACVsensors and maintains the safety of ACV system. In [28], theauthors studied the essence of secure and safe codesign forACV systems.

However, despite their importance, the architecture andsolutions in [11]–[28] do not take into account the interde-pendence between the cyber and physical layers of ACVswhile designing their security solutions. Moreover, the priorart in [11]–[28] does not provide solutions that can enhancethe robustness of ACV motion control to malicious attacks.Nevertheless, designing an optimal and safe ITS requiresrobustness to attacks on intra-vehicle sensors as well as inter-vehicle communication. In addition, these existing works donot properly model the attacker’s action space and goal (phys-ical disruption in ITS) while providing their security solutions.In this context, the cyber-physical interdependence of theattacker’s actions and goals will help providing better securitysolutions. Finally, the existing literature lacks a fundamentalanalysis of physical attacks as in the Jeep hijacking scenario[10] in which the attacker aims at disrupting the ITS operationby causing non-optimality in a compromised ACV’s speed.

B. Contributions

The main contribution of this paper is, thus, a compre-hensive study of joint cyber-physical security challenges andsolutions in ACV networks which can be summarized asfollows:

• To address both safety and optimality of an ACV system,first an optimal safe controller is proposed so as to max-imize the traffic flow and minimize the risk of accidentsby optimizing the speed and spacing of ACVs. To the bestof our knowledge, this work will be the first to analyzethe physical attack on a ACV network and to prove thatthe proposed controller can maximize the stability androbustness of ACV systems against physical attacks suchas in the Jeep hijacking scenario [10].

• To improve the cyber-physical security study of ACVsystems, next, two new DIA detection approaches areproposed to address cyber attacks on ACV sensors.

• The first proposed DIA detection approach finds theoptimal threshold level for sensors which have a prioriinformation about the ACV state. Using such threshold,the proposed approach is shown to efficiently detect DIAswhen compared to baseline scenarios that use only aKalman filter for the state estimation. This approach isparticularly suitable for scenarios in which the ACV hasa priori information about some of its state variables.

• To detect DIAs for scenarios in which ACV has noprior information about its state variables, a novel multi-armed bandit (MAB) algorithm is proposed to learn whichsensors are attacked. The proposed MAB algorithm usesthe so-called Mahalanobis distance between the sensordata and an a-posteriori prediction to calculate a regretvalue and optimize the ACV’s sensor fusion process byapplying an upper confidence bound (UCB) algorithm.

• The maximum physical disruption on the ACV operationcaused by a DIA is studied analytically. In addition, it isshown that the proposed DIA detection approaches cansuppress the effects of DIAs on the optimal velocity andspacing of the ACV networks.

Simulation results show that the proposed optimal safe con-troller has higher safety, optimality, and robustness againstphysical attacks compared to other state-of-the-art approaches.In addition, our results show that the proposed DIA detec-tion approaches yield an improved performance compared toKalman filtering in mitigating the cyber attacks. Therefore,the proposed solutions improve the stability of ACV networksagainst DIAs.

The rest of the paper is organized as follows. Section IIintroduces our system model while Section III derives theoptimal safe controller. Section IV proves that the proposedoptimal safe controller is robust against physical attacks.Section V proposes approaches to mitigate cyber attacks onACV while reducing the risk of accidents. Finally, simulationresults are shown in Section VI and conclusions are drawn inSection VII.

II. SYSTEM MODEL

A. ACV Physical Model

Consider an ACV, f , that follows a leading ACV, l andtries to maintain a spacing from ACV l as shown in Fig.1. Maintaining a spacing between ACVs is important tomaximize the traffic flow and minimize the risk of accidents[9]. Let v f be ACV f ’s speed in m/s. Then, ACV f ’s speeddeviation can be written as v f (t) = Ff (t)

m f, u f (t), where

Ff (t) is f ’s engine force in Newtons (N), m f is f ’s massin kilograms (kg), and u f (t) is f ’s physical controller inputin N/kg . Moreover, letting vl be l’s speed, the spacing d(t)between f and l can be written as d(t) = vl (t) − v f (t). Notethat this model can be easily generalized to multiple ACVs byrepeating the same set of equations for every pair of ACVs tocapture any ACV network as shown in Fig. 1. Due to discretetime sensor readings in ACVs, we convert the aforementionedcontinuous system model to a discrete one using a lineartransformation as follows [29]:

v f (t + 1) = v f (t) + Tu f (t), d(t + 1) = d(t) + Tvl (t) − Tv f (t),(1)

where T is the sampling period of the sensors in seconds. Themodel can be summarized as:

x(t + 1) = Ax(t) + Bu f (t) + Fvl (t), (2)

Page 3: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

3

Roadside

sensor

V2VV2I

Leading vehicle

with speedSpacing

sensors measuring

sensors measuring

sensors measuring Vulnerable to

cyber attacks

Vulnerable to

physical attacks

Under study

vehicle with

speed

Camera &

LiDAR

Radar

Fig. 1: Illustration of the considered ACV system model.

where

x(t) =[v f (t)d(t)

], A =

[1 0−T 1

], B =

[T0

], F =

[0T

].

(3)

In (2), we consider a constant velocity within each time period,however, the velocity varies at every time step. This is acommon model used in prior works such as [30] and [31].To validate the practicality of the proposed system model, weneed to show that u f (t) can control the speed and spacing off , i.e., u f can take state vector x(t0) to any desired state x(t1).The following remark shows that u f can control system (2).

Remark 1. If T > 0, then the system in (2) is controllable. Toillustrate the reason, we know that the system (2) is control-lable if the rank of controllability matrix C =

[B AB

]is

2 (number of state variables) [29]. Thus, we have:

C=

T0

[1 0−T 1

].

[T0

] =

T T0 −T2

. (4)

Therefore, for any T > 0 the columns of C are linearlyindependent which implies that the rank of C will be 2.

B. ACV cyber model

In order to navigate, as shown in Fig. 1, ACV f relieson n f , nl , and nd sensors which measure v f (t), d(t), andvl (t), respectively. For instance, multiple intra-vehicle inertialmeasurement units (IMUs) measure v f (t), multiple cameras,radars, and LiDAR can measure d(t), and roadside sensors andACV l measure vl (t) and transmit the measurements to ACV fusing vehicle-to-vehicle (V2V) and V2I communication links.Thus, we model the sensor readings as follows:

z f (t) = h f v f (t) + e f (t),zl (t) = hlvl (t) + el (t),zd (t) = hdd(t) + ed (t),

(5)

where z f (t), zl (t), and zd (t) are sensor vectors with n f ,nl , and nd elements which measure v f (t), vl (t), and d(t),respectively. Also, h f , hl , and hd are vectors with n f , nl ,and nd elements equal to 1. Moreover as assumed in [29]and [32], e f (t), el (t), and ed (t) are noise vectors that followa white Gaussian distribution with zero mean and variancevectors σ2

f =

[σ2

f1, . . . , σ2

fn f

]T, σ2

l=

[σ2l1, . . . , σ2

lnl

]T, and

σ2d=

[σ2d1, . . . , σ2

dnd

]T. Since the sensor readings in (5) are

noisy, we need to optimally estimate v f (t), vl (t) and d(t)from z f (t), zl (t), and zd (t) by minimizing the estimationerror. To find the optimal estimations v f (t), vl (t), and d(t)(the estimations of v f (t), vl (t), and d(t)), we use two types ofestimators. A static estimator to estimate the variables at theinitial state, t0, (the time step that f starts following l) and adynamic estimator to estimate v f (t) and d(t) using the stateequations in (1). We use a static estimator at the initial statebecause ACV f does not follow the speed of ACV l using(1) before t0 and, hence, a dynamic estimator cannot be useddue to lack of information about the dynamics of f before t0.Moreover, for vl (t), we always use a static estimator since wedo not have any information on the dynamics of vl (t).

For the static estimator, we define a least-square (LS) costfunction for each variable as follows:

Jf (v f (t)) =12

∞∑t=0

(z f (t) − h f v f (t)

)TR−1

f

(z f (t) − h f v f (t)

),

(6)

Jl (vd (t)) =12

∞∑t=0

(zl (t) − hl vl (t))T R−1l (zl (t) − hl vl (t)) ,

(7)

Jd (d(t)) =12

∞∑t=0

(zd (t) − hd d(t)

)TR−1d

(zd (t) − h f d(t)

),

(8)

where R f , Rl , and Rd are the measurement covariance ma-trices associated with sensors measure v f (t), vl (t), and d(t),respectively. Moreover, since the sensors independently mea-suring the three variables, they will not have any noisecovariance and R f , Rl , and Rd will be diagonal. Therefore,explicit solution for the optimal estimation can be derived as:

v f (t) =(hTf R

−1f h f

)−1hTf R

−1f z f (t) =

n f∑i=1

1σ2

fi∑n f

i=11σ2

fi︸ ︷︷ ︸wfi

z fi (t)

= v f (t) +n f∑i=1

w fi e fi (t), (9)

vl (t) =(hTl R

−1l hl

)−1hTl R

−1l zl (t) =

nl∑i=1

1σ2li∑nl

i=11σ2li︸ ︷︷ ︸

wli

zli (t)

= vl (t) +nl∑i=1

wli eli (t), (10)

d(t) =(hTdR

−1d hd

)−1hTdR

−1f zd (t) =

nd∑i=1

1σ2

di∑nd

i=11σ2

di︸ ︷︷ ︸wdi

zdi (t)

= d(t) +nd∑i=1

wdi edi (t). (11)

Page 4: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

4

For the dynamic estimator, we use a Kalman filter which usesthe state equation in the estimation process. To this end, wedefine an output equation as follows:

z(t) = Hx(t) + e(t), s.t. H =[

h f 0n f ×10nd×1 hd

], z(t) =

[z f (t)zd (t)

], e(t) =

[e f (t)ed (t)

].

(12)

Note that we cannot apply the dynamic estimator on vl (t)since we do not have a-priori information about the dynamicsof l. To dynamically estimate d(t) and v f (t), we use an apriori estimation derived from the state equations as well asa weighted residual of output error to correct the a prioriestimation as follows [33]:

x(t)︸︷︷︸estimation

= x (−) (t)︸ ︷︷ ︸a priori estimation

+K (t)

residual︷ ︸︸ ︷[z(t) − Hx (−) (t)

]︸ ︷︷ ︸correction

, (13)

where K (t) is the Kalman gain. By defining an a posteriorierror covariance matrix P(t) = E

[r (t)rT (t)

], where r (t) =

x(t) − x(t), we can find a K (t) to minimize trace [P(t)] =E

[rT (t)r (t)

]. The solution for such K (t) can be given by

[33]:

x (−) (t) = Ax(t − 1) + Bu f (t − 1) + F vl (t − 1), (14)

P(−) (t) , AP(t − 1)AT , (15)

K (t) = P(−) (t)HT[HP(−) (t)HT + R

]−1, (16)

x(t) = [I − K (t)H] x (−) (t) + K (t)z(t), (17)

P(t) = [I − K (t)H] P(−) (t). (18)

where R =

[R f 0n f ×nd

0nd×n f Rd

]is a block diagonal matrix. As

can be seen from (15), (16), and (18), the update processesfor P(t) and K (t) are independent of the states and controller.Thus, P(t) and K (t) will converge to constant matrices, P andK .

Note that, the presented Kalman filtering is the optimal stateestimation method for the case with no delays in vehicle-to-everything (V2X) link. We do not consider delays since ourmain focus is to address the cyber-physical security and controldesign for ACVs. Considering zero delays in the V2X link istypical in the literature related to ACV control such as [23]–[26]. We can still take into account delays by designing afeedback mechanism as discussed in [34] and [35], in order tooptimally estimate the state of a system by delayed V2X/V2Idata. In such a feedback mechanism, we can use the data fromthe intra-vehicle sensors to estimate the state and, then, we canupdate the estimation of the past states by using the delayedV2X data. Note that, in our model, V2X and V2I links are notthe only sources of information that each vehicle receives. TheACV also uses the readings from camera, LIDAR, and radarto estimate its state variables. Thus, in case of a delay in theV2X and V2I links, the ACV can still have state estimationusing other intra-vehicle sensors and later can use the delayedinformation from the V2X and V2I to improve its estimationas done in [34] and [35].

For the studied system, we now define the cyber-physicalsecurity problems for ACV systems that we will study next.

Follower CAV 𝑓 Leader CAV 𝑙

SpacingFollower’s speed Follower’s speed

a-priori information a-posterior information

Estimated state

variables:

Available

information:

Attack purposes:

Threshold-based detection MAB LearningAttack detection

algorithms:

Increase accident risk Reduce traffic flow

Attack

targets

Intra-vehicle sensors of 𝑓

V2X link

Adversarial control of 𝑙

Fig. 2: Illustration of the studied cyber-physical attacks andtheir solutions.

We will address three main problems: 1) What is the optimalsafe ACV controller for the system in (2) that minimizes therisk of accidents while maximizing the traffic flow on theroads? 2) Is the proposed optimal safe controller for ACVsystems robust and stable against physical attacks? and 3)How to securely fuse the sensor readings to mitigate DIAson ACVs and minimize the impact of such attacks on thecontrol of ACVs? Addressing these problems is particularlyimportant because the model in (2) identifies the microscopiccharacteristics of an ACV network and, thus, to achieve large-scale security and safety in ACV networks we must secureevery ACV against cyber-physical attacks.

Addressing these three problems requires a comprehen-sive study of the interdependencies between the cyber andphysical characteristics of ACV systems as shown in Fig. 2.Such interdependent cyber-physical study helps to find thevulnerabilities of the ACV systems against both cyber andphysical attacks. Thus, we can derive an optimal controller thatminimizes the risk of accidents and we can design cyber attackdetection approaches that not only take into account the cybercharacteristics of the ACV system, but also aims at minimizingthe likelihood of collisions in ACV networks. Unlike theworks in [11]–[19], we consider the physical characteristicsof ACV systems while developing DIA detection approaches.Moreover, the combined optimal safe ACV controller designhas not been studied previously in [22]–[28].

III. OPTIMAL SAFE ACV CONTROLLER

Our first task is thus to derive an optimal safe controllerfor ACV systems. To analyze ACV f ’s optimal control inputu f (t), we define an optimal safe spacing value as o(v f (t)) ,v2f

(t)2b f T

, where bf is ACV f ’s maximum braking deceleration.This value is defined so as to guarantee that if the leading ACVl stops suddenly, the following ACV f will stop completely

Page 5: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

5

before hitting ACV l as long as f starts braking immediatelyafter observing l’s braking process. This can be captured bythe following energy equivalence condition:

m f

2(v2

f (t) − 0)︸ ︷︷ ︸Kinetic Energy

= m f bf d(t)︸ ︷︷ ︸Potential Energy

⇒ o(v f (t)) =v2f (t)

2bf. (19)

Our goal here is to maintain an optimal safe spacing betweenACVs f and l. Thus, we define a physical regret R(t) as thesquare of difference between the optimal safe spacing and theactual spacing. This regret quantifies the safety and optimalityof ACV motion by preventing any collisions and minimizingthe spacing between the ACVs and can be written as follows:

R(t) =(o(v f (t + 1)) − d(t + 1)

)2= *

,

v2f (t + 1)

2bf− d(t + 1)+

-

2

= *,

(v f (t) + Tu f (t))2

2bf− d(t) − Tvl (t) + Tv f (t)+

-

2

(20)

Note that, o(v f (t+1) is the required spacing for ACV f to stopbefore hitting the leading ACV l when the leading ACV l stopssuddenly. Thus, any spacing less than o(v f (t + 1) will have arisk of accident and larger than o(v f (t + 1) will cause non-optimal traffic flow. In addition, each ACV only has access toestimation of v f (t), vl (t), and d(t). Thus, ACV f must designan input u f (t) to minimize an estimation of physical regretwhich is defined as follows:

R(t) = *,

(v f (t) + Tu f (t))2

2bf− d(t) − T vl (t) + T v f (t)+

-

2

. (21)

This problem is challenging to solve because vl (t) is anindependent parameter and ACV f cannot be sure about thefuture values of vl (t). To solve this problem, we consider twoscenarios: a) ACV f has no prediction about ACV l’s futurespeed values (One-step ahead controller) and b) ACV f hasa predictor which can predict ACV l’s future speed value forN time steps (N-step ahead controller)(such predictors haveattracted recent attention in the transportation literature, e.g.,see [36] and [37]). Next, we propose an optimal controller forthese two cases.

A. One-step Ahead Controller

To solve the one-step ahead controller problem, we considersome physical limitations on the speed and the control input.We prohibit the speed from being greater than the free-flowspeed of a road, v. Moreover, due to the physical capabilities ofthe vehicle and for maintaining passengers’ comfort, we musthave a limitation on the control input and speed deviation.Thus, the optimization problem of the ACV f can be writtenas follows:

u∗f (t) = arg minu f (t)

R(t) (22)

s.t. uminf ≤ u f (t) ≤ umax

f , (23)

0 ≤ v f (t + 1) ≤ v, (24)|u f (t) − u f (t − 1) | ≤ ∆u, (25)

where uminf < 0 and umax

f> 0 are the minimum and maximum

allowable control input and ∆u is the maximum allowablechange in the controller to yield a comfortable ride. Next, wederive the one-step ahead controller assuming that the ACVknows umin

f < 0, umaxf

> 0, ∆u, and v. These assumptions arereasonable since they are known for the manufacturer of theACVs who designs the optimal controller.

Theorem 1. The one-step ahead optimal controller is:

u∗f (t) = min{

max{u1(t),

1T

(√2bf

(d(t) + T vl (t) − T v f (t)

)− v f (t)

)}, u1(t)

}(26)

where u1(t) , max{−vf (t)

T , uminf, u f (t − 1) − ∆u

}and u1(t) ,

min{v−vf (t)

T , umaxf, u f (t − 1) + ∆u

}.

Proof. See Appendix A. �

Theorem 1 derives the optimal controller for the ACV fwhen it only optimizes its action for the next step withoutconsidering future actions. In the next subsection, we derivethe ACV f ’s optimal controller when it considers minimizingthe regret for N step ahead.

B. N-step Ahead Controller

To find the N-step ahead controller, first we define adiscount factor 0 ≤ γ ≤ 1 which specifies the level offuture physical regret for the decision taken at each time step.Thus, by defining the N-step ahead total discounted physicalregret R (t, N ) ,

∑t+N−1τ=t γτ−t R(τ), the controller optimization

problem can be written as follows:

u∗f (t) = [1, 0, . . . , 0︸ ︷︷ ︸N−1

]{

arg minuN

f(t)R (t, N )

}, (27)

where the conditions in (23), (24), and (25) hold true. More-over uN

f(t) = [u f (t), . . . , u f (t + N − 1)]T , N is the number of

future steps which is taken into account in finding the optimalcontroller, and u∗f (t) is the optimal controller at time step t.Next, we derive the N-step ahead optimal controller whileconsidering that the ACV f can predict the N future valuesof the velocity of the leading vehicle [38].

Theorem 2. The solution of N-step ahead controller isequivalent to the one-step ahead controller.

Proof. To solve the problem in (27), we use a so called indirectmethod. To this end, we start by defining an augmentedphysical regret using the following state equation:

R ′(t, N ) = R (t, N ) +t+N−1∑τ=t

λT (τ + 1) (28)

×

Ax(τ) + Bu f (τ) + Fvl (τ)︸ ︷︷ ︸

g(τ)

−x(τ + 1)

=

t+N−1∑τ=t

[γτ−t R(τ) + λT (τ + 1)

[g(τ) − x(τ + 1)

] ],

(29)

Page 6: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

6

where λ(τ) = [λ1(τ), λ2(τ)]T . Then, let Hamiltonian functionbe defined as H (τ) , γτ−t R(τ) + λT (τ + 1)g(τ). Thus, wecan write (29) as follows:

R ′(t, N ) = λT (t + N ) x(t + N )︸ ︷︷ ︸terminal time

+ H (t)︸︷︷︸initial time

+

t+N−1∑τ=t+1

[H (τ) − λT (τ) x(τ)

]

︸ ︷︷ ︸running time

. (30)

Thus, to find critical points (candidate minima) we must solve∇R ′(t, N ) = 0. First, we start by finding the differentialdR ′(t, N ) and then we identify the derivatives as follows:

dR ′(t, N ) = λT (t + N )d x(t + N ) +(∇x (t)H (t)

)T d x(t)

+

t+N−1∑τ=t+1

(∇x (τ)H (τ) − λ(τ)

)T d x(τ)

+

t+N−1∑τ=t

(∇u (τ)H (τ)

)T du(τ)

+

t+N∑τ=t+1

(∇λ (τ)H (τ − 1) − x(τ)

)T dλ(τ). (31)

Thus, to have dR ′(t) = 0 each of the terms in brackets mustbe equal to zero:

λ(t + N ) = 0, (32)

∇x (t)H (t) = 0,⇒ x(t) =[v f (t), d(t)

]T, (33)

λ(τ) = ∇x (τ)H (τ) = ∇x (τ)γt−τ R(τ) + ∇x (τ)g(τ)λ(t + 1),

∀τ = t + 1, . . . , t + N − 1, (34)

0 = ∇u (τ)H (τ) = ∇u (τ)γt−τ R(τ) + ∇u (τ)g(τ)λ(t + 1),

∀τ = t, . . . , t + N − 1, (35)∇λ (τ+1)H (τ) = x(τ + 1) ⇒ x(τ + 1) = g(τ),∀τ = t, . . . , t + N − 1. (36)

Now, using (34) we will have:

λ(τ) =

vf (τ)+Tu f (τ)b f

+ T−1

γt−τ

((v f (τ) + Tu f (τ))2

2bf− d(τ)

− T vl (τ) + T v f (τ))+ ATλ(τ + 1). (37)

Moreover, from (35) we will have:

0 = γt−τ *,

(v f (τ) + Tu f (τ))2

2bf− d(τ) − T vl (τ) + T v f (τ)+

T (v f (τ) + Tu f (τ))bf

+ BTλ(τ + 1). (38)

Since λ(t + N ) = 0, then from (38), we derive:

u f (t + N − 1) =1T

(2bf

(d(t + N − 1) + T vl (t + N − 1)

− T v f (t + N − 1)))1/2

− v f (t + N − 1)]. (39)

By substituting u f (t+N −1) in (37), we obtain λ(t+N −1) =0. This process can continue until τ = t where we obtain

λ(t+1) = 0 and u f (t) = 1T

√2bf

(d(t) + T vl (t) − T v f (t)

)−

v f (t)]. Moreover, considering the constraints (23), (24), and

(25), we will end up having the optimal controller as definedin Theorem 1. Thus, we prove that the N-step ahead optimalcontroller is equivalent to 1-step ahead optimal controller. �

From Theorem 2, we can observe that, if the ACV f min-imizes its immediate physical regret, it will also minimize itslong-term physical regret. This result shows that the proposedoptimal safe controller does not require any information fromfuture dynamics of ACV l as done in [36] and [37].

IV. PHYSICAL ATTACK ON ACV SYSTEMS

As derived in the previous section, the proposed optimalcontroller is a function of vl (t). This makes ACV f vulnerableagainst a physical attack on ACV l. Thus, we now analyzewhether an attacker can cause instable dynamics at ACV fby controlling l. Consider an adversary who takes the controlof ACV l and tries to cause instability in ACV f ’s speed,v f (t), and spacing d(t). Using our derived optimal controller,by ensuring u f (t) is not saturated (u1(t) ≤ u f (t) ≤ u1(t)), andconsidering the estimated values to be close to the real valueswe will have:

v f (t + 1) =√

2bf (d(t) + Tvl (t) − Tv f (t)︸ ︷︷ ︸g1 (x (t),vl (t))

,

d(t + 1) = d(t) + Tvl (t) − Tv f (t)︸ ︷︷ ︸g2 (x (t),vl (t))

.

⇒ x(t + 1) = g(x(t), vl (t)), (40)

where g =[g1(x(t), vl (t)), g2(x(t), vl (t))

]T . (40) is designedsuch that ACV f will always maintain an optimal safe spacingwith l. However, from (40) we can see that the behavior ofthe system is a function of vl . Thus, next, we will analyze thephysical attack scenario that is analogous to the Jeep hijackingcase in [10]. First, we present basic definitions for stabilityanalysis of ACV networks. Then, we prove that our proposedcontroller is stable against any physical attacks.

A. Equilibrium Point of the ACV System

Our first step here is to study the stability of our system.To analyze the stability of (40), first, we define some usefulconcepts.

Definition 1. [32] x(vl) is said to be an equilibrium pointfor the system in (40) and a constant input vl , if g( x(vl), vl) =x(vl).

An equilibrium indicates a point at which the states willnot change. From Definition 1 we can derive the equilibriumpoint for (40) by considering a constant input vl and solvingthe following set of equations:

v f (vl) =√

2bf (d(vl) + Tvl − T v f (vl),

d(vl) = d(vl) + Tvl − T v f (vl),(41)

Page 7: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

7

which results in: x(vl) =[vl,

vl2b f

]T. The derived value for

x(vl) shows that in order to reach an equilibrium, ACV fmust maintain the optimal safe spacing from ACV l and itsspeed must equal to vl . Thus, next, we show that our derivedoptimal controller is robust, i.e., under our controller if anadversary hijacks ACV l, it cannot cause instability for v f (t)and d(t).

B. Stability of the ACV System Against Physical Attacks

Definition 2. A system is called asymptotically stable aroundits equilibrium point if it satisfies the following two conditions[32]: 1) Given any vl > 0 and ε > 0, ∃δ1 > 0 such that if|x(ts) − x(vl) | < δ1, then |x(t) − x(vl) | < ε , ∀t > ts and 2)∃δ2 > 0 such that if |x(ts) − x(vl) | < δ2, then x(t) → x(vl)as t → ∞.

The first condition requires the state trajectory to be con-fined to an arbitrarily small “ball” centered at the equilibriumpoint and of radius ε, when released from an arbitrary initialcondition in a ball of sufficiently small (but positive) radius δ1.This is called stability in the Lyapunov sense [32], [39]. It ispossible to have Lyapunov stability without having asymptoticstability.

Next, we show how a Lyapunov function can help to analyzethe stability of system (40) [32]. From [32], we know that ifthere exists a Lyapunov function for system (40), then x(t) =x(vl) is a stable equilibrium point in the sense of Lyapunov. Inaddition, if L(g(x(t), vl)) − L(x(t)) < 0 then x(t) = x(vl) isan asymptotically stable equilibrium point. We can prove thatthe system in (40) is asymptotically stable for the equilibriumpoint x(vl), as follows.

Proposition 1. x(t) = x(vl) is a stable equilibrium point inthe Lyapunov sense.

Proof. Let L(x(t)) =(v2f

(t)2b f− d(t)

)2. Then we will have

L( x(vl)) = 0. Moreover, we can show that L(x(t)) ≥ 0, ∀x(t).Now, to check if L is a Lyapunov function we have:

L(g(x(t), vl)) − L(x(t)) =(

12bf

(√2bf (d(t) + Tvl − Tv f (t)

)2

− d(t) + Tvl − Tv f (t))2

− *,

v2f (t)

2bf− d(t)+

-

2

= − *,

v2f (t)

2bf− d(t)+

-

2

≤ 0. (42)

Thus, x(t) = x(vl) is a stable equilibrium point in the senseof Lyapunov. �

From Proposition 1, we can see that, as long as f followsl using our proposed controller, its speed and spacing froml will stay stable and will not be affected by the physicalattack on l. This shows that, not only our proposed controllermaximizes the safety and optimality in ITS roads, but also itis robust to physical attacks such as in the Jeep scenario [10].

However, as can be seen from Theorems 1 and 2, eventhough it is robust to physical attacks, the derived optimal

controller is largely dependent on the estimated values v f (t),vl (t), and d. Thus, an adversary can manipulate the sensordata to inject error in the estimation and ultimately increasethe ACV f ’s physical regret. Analyzing such attacks require acyber-physical study of the ACV system to derive approachesthat mitigate attacks on sensors and minimize the effect ofsuch attacks on the physical regret. Thus, next, we analyzethe cyber attack on the ACV system.

V. CYBER ATTACK ON ACV SYSTEMS

We now consider a cyber attacker that injects faulty data tosensor readings (cameras, LiDARs, radars, IMUs, and roadsidesensors) such that the attacked sensor vector can be written as:

z(t) ,[z f (t)zd (t)

]

zl (t)

,

z f (t)zd (t)zl (t)

+

a f (t) , [a f1, . . . , a fn f]T

ad (t) , [ad1, . . . , adnd]T

al (t) , [al1, . . . , alnl ]T

, (43)

where a f (t), ad (t), and al (t) are data injection vectors onsensors and z f (t), zd (t), and zl (t) are compromised sensorreadings from v f (t), d(t), and vl (t), respectively1. As wediscussed in the state estimation section, we have a-prioryinformation about sensors which measure v f (t) or d(t), butsuch information is lacking for sensors that collect data fromvl (t). Thus, next, we consider cyber attacks on sensors a) witha-priori information and b) without a-priori information.

A. Attack on Sensors With Prior Information

As discussed in Subsection II-B, we use Kalman filteringto estimate v f (t) and d(t). However, Kalman filtering is notrobust to DIAs [40]. Thus, we propose a filtering mechanismthat can limit the effect of the DIA on v f (t) or d(t). To thisend, we use the a priori estimation x (−) (t) at each time step tofind an a priori sensor reading z (−) (t) , Hx (−) (t). Then, theattack detection filter checks the absolute value of the resid-

ual |µ(t) | ,[|µ f1 (t) |, . . . , |µ fn f

(t) |, |µd1 (t) |, . . . , |µdnd(t) |

]T

, ���z(t) − z (−) (t)��� < η, where η is the threshold vector. Anysensor which violates this inequality will be considered as acompromised sensor and will not be involved in Kalman filterupdate procedure. To find an optimal value for the threshold η,we next characterize the stochastic behavior of residual µ(t)when the ACV is not under attack.

Theorem 3. The residual µ(t) follows a Gaussian distributionwith zero mean and covariance matrix as follows:

Cµ = HCrHT + R − RK

TQTHT − HQKR, (44)

and Cr is the solution of following discrete Ricatti equationCr = QACrA

TQT + Cρ, where Cρ = QFσ2lFTQT +

QKRKTQT , and Q =

[I +

[I − KH

]−1KH

]−1.

1Using the definition for observability [32], one can easily derive that themaximum number of compromised vf , d, and vl type sensors tolerated bythe ACV’s state estimation process are n f − 1, nd − 1, and nl − 1

Page 8: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

8

015

2

4

6

1010

8

0 10-3

10

10-3 0

12

-5 -1

200.05

0

-0.05

1

2

3

105

4

10-3

5

0

6

-2

Fig. 3: Analytical and simulation result for cumulativedensity function of ρ(t) and r (t).

10

21 3

1

243

2

54

3

5 66

77

88 1 2 3 4 5 6 7 8

-0.07

-0.06

-0.05

-0.04

-0.03

-0.02

-0.01

0

0.01

Fig. 4: Analytical and simulation result for the mean andvariance of µ(t).

Proof. See Appendix B. �

Theorem 3 derives the distribution of µ(t) which we willuse next to find an optimal value for the threshold level η.Fig. 3 and Fig. 4 show a comparison between simulation andanalytical results derived for the mean and covariance matrixof r (t), ρ(t), and µ(t). From Figs. 3 and 4 we can see that theanalytical results match the simulation results which validatesTheorem 3.

We can now find the probability with which µi (t) (el-ement i of µ(t)) remains below the threshold value ηi asPr (|µi | ≤ ηi) = Pr (−ηi ≤ µi ≤ ηi) = Φµi (ηi) − Φµi (−ηi),where Φµi (.) is the cumulative density function of µi whichfollows a Gaussian distribution with zero mean and vari-ance Cµ (i, i) (element in i-th row and i-th column of Cµ).Thus, we can derive the optimal value for η by defin-ing Pr ( |µi | ≤ ηi) for every sensor. For instance, choosingvalues ηi = Cµ (i, i), 2Cµ (i, i), or 3Cµ (i, i) will result inPr (|µi | ≤ ηi) = 0.68, 0.95, or 0.997. Even by defining thethreshold value, the attacker might stay stealthy in some casesif it controls the amount of injected data to the sensors. Next,we find a relationship between the maximum value of DIAand the probability of staying stealthy.

Proposition 2. The probability with which an attack vectora ,

[a f , ad

]Twill not trigger the i-th element of attack

detection filter, µi , (stealthy attack) will be given by:

pi ( a) , Pr (|µi | ≤ ηi | a) = Φµi (Ψ i a + ηi) − Φµi (Ψ i a − ηi),(45)

where Ψ i is the i-th row of Ψ =[I − H

[I − QA

]−1 QK].

Proof. Suppose that the attacker initiates attack after theKalman filter converges. Thus, from (14) and (17) we have:

x(t) =[I − KH

]x (−) (t) + K z(t)︸︷︷︸

[Hx(t)+e(t)]

+K

[a f

ad

]

︸ ︷︷ ︸a

=[I − KH

] [Ax(t − 1) + Bu f (t − 1)

+ F vl (t − 1)]+ KH [x(t) + r (t)] + K [e(t) + a] . (46)

Thus, by simplifications analogous to the proof of Theorem 3,we can find r (t) = QAr (t −1)−QK a+ ρ(t). The expectationof r (t) will be E {r (t)} = QAE {r (t − 1)} − QK E { a} . Sincethe attack vector is constant, a, and E {r (t)} = E {r (t − 1)} fort → ∞ we will have the steady state expected estimation erroras E {r (t)} , r = −

[I − QA

]−1 QK a. Thus, considering suchsteady state expected estimation error, r , we can derive themean of µ(t):

E {µ(t)} = E{z(t) − z (−) (t)

}= E {Hr (t) + a + e(t)}

=[I − H

[I − QA

]−1 QK]︸ ︷︷ ︸

Ψ

a.

Since ad is constant, then σµ will not change. Therefore, wecan find the probability of not triggering the attack detectionfilter for µi as in (45). �

Proposition 2 derives the probability of staying stealthy forany particular DIA vector. Next we will find how much ourdefined attack detection filter is robust against a particularDIA. To this end, first we define a stealthy attack probabilityvector as p( a) =

[p1( a), . . . , pn f +nd

( a)]T

, which is a vectorthat shows the probability of staying stealthy by initiating aon all of the v f and d type sensors. The attacker has to findthe optimal value of a which maximizes the physical regretwhile staying stealthy with probability below a defined value,p, i.e., attack vector must be chosen such that the physicalregret is maximized and every sensor i stays stealthy with theprobability below the i-th element of p.

Corollary 1. The attacker’s optimal attack vector a∗ whichmaximizes the steady state physical regret by attacking v f or dtype sensors is the solution of following optimization problem:

arg maxa

θ2 ,[rTΘ r + cT r

]2, (47)

s.t. r = −[I − QA

]−1 QK a, (48)

Θ ,

[ 12b f

00 0

], c =

√d

b f+ T−1

, (49)

p( a) � p, (50)

where d is the average spacing between the ACVs and � isan element-wise comparison.

Proof. From the definition of physical regret we have:

R(t)���u∗f

(t)=

*..,

(v f (t) + Tu∗f (t)

)2

2bf− d(t) − Tvl (t) + Tv f (t)

+//-

2

= θ2(t), (51)

Page 9: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

9

where

θ(t) ,r2f (t)

2bf+

r f (t)bf

√d(t) + T vl (t) − T v f (t)︸ ︷︷ ︸

'd

− rd (t) − Trl (t) + Tr f (t). (52)

Moreover, the steady state value of θ(t) can be derived byusing the steady state expected value of r as θ = rTΘ r + cT r .Also, the attacker must choose ad to satisfy (45). Thus, theattacker must solve the optimization problems in (47). �

Corollary 1 derives a robustness level for our proposedattack detection filter. It allows us to find the maximum regretwhich is caused by any stealthy attack vector. This allows usto design a secure cyber-physical ACV system since we takeinto account the physical regret of DIAs on sensors. However,the proposed attack detection filter works for sensors witha-priori information only. For ACV f ’s other sensors whichmeasure vl (t) we do not have a-priori information. Thus, wenext address this case using MAB.

B. Multi-armed Bandit Learning for Attack Detection in Sen-sors Without Prior Information

To detect the attack on vl type sensors, we cannot useany a priori estimation for vl (t) as done for v f and d typesensors because ACV f cannot have any information aboutevaluation of vl (t) since it does not know how ACV l isbeing controlled. Such lack of a priori information makesthe attack detection challenging. To overcome this challenge,we propose to use a MAB learning approach because suchapproach does not require a-priori information and finds theoptimal action (detecting attacked sensors) only by interactingwith the sensors and ACV’s dynamics. Before applying theMAB algorithm, we define an a posteriori estimation for vl (t):

v (+)l

(t) =d(t + 1) − d(t)

T+ v f (t). (53)

Here, we also define an a posteriori residual as ���µ(+)l

(t)��� ,[|µ(+)

l1(t) |, . . . , |µ(+)

lnl(t) |

]T,���hlv

(+)l

(t)− zl (t)���. Next, we analyze

the distribution of µ(+)l

(t) to use it to design an attack detectionfilter.

Theorem 4. µ(+)l

(t) follows a Gaussian distribution with zeromean and covariance matrix Cµl :

Cµl = ΥRlΥT + hlJ

[HACrA

THT − RKTQT ATHT

− HAQKR + R]JT hTl , (54)

where Υ =[−

s1T hlw

Tl− I

]and J =

[0 1

]K .

Proof. See Appendix C. �

Theorem 4 derives the covariance matrix for µ(+)l

(t) whichwe will use to detect anomaly in the vl (t) type sensors. To thisend, first, we define the squared Mahalanobis (SM) distance, ameasure which quantifies the distance between the a posterioriresidual of sensors in a subset L of vl type sensors, µ(+)

l(t,L),

and the distribution of a posteriori residuals which is derived

in Theorem 4. SM will help us find how much every subsetL deviates from its distribution.

Definition 3. The squared Mahalanobis distance betweensubset L’s a posteriori residual at time t and its distributionis defined as DL (t) = µ(+)

l

T(t,L)C−1

µl(L)µ(+)

l(t,L), where

DL (t) is the subset L’s SM and Cµl (L) is the covariancematrix associated with L.

Next, we derive the expected value and variance ofDL (t). We know from [41, (378)] that E {DL (t)} =E

{µ(+)l

T(t,L)C−1

µl(L)µ(+)

l(t,L)

}= Tr

(C−1µl

(L)Cµl (L))=

Tr (I ) = |L|, where |L| is the number of sensors in L. Inaddition, for a subset L of vl type sensors and an attack vectoral (t), we will have:

E {DL (t)} = E{µ(+)l

T(t,L)C−1

µl(L)µ(+)

l(t,L)

+ aTl (t)C−1µlal (t)

}

= |L| + E{aTl (t,L)C−1

µl(L)al (t,L)

}, (55)

where al (t,L) consists of only elements of al (t) whichare in L. Since Cµ (L) is positive semi definite thenC−1µ (L) is also positive semi definite. Thus, we will haveE

{aTl

(t,L)C−1µl

(L)al (t,L)}≥ 0. Therefore, if there exists

a subset L such that E {DL (t)} ≥ |L|, then there is at leastone sensor under attack in L. Hence, at each time step, toestimate vl (t), ACV f must choose a subset L which hasthe least divergence from |L|. To capture the security levelof a subset, we define a security divergence (SD) metric fora subset L as ςL (t) , E{DL (t)}

|L |≥ 1. Therefore, a subset L

with a lower ςL (t) value has a higher security level.Moreover, any subset L of vl type sensors have a cost based

on the estimation error induced to vl (t). Thus, we define thecost of L, as νL = E

{(vl (t) − vl (t,L))2

}, where vl (t,L) is

the estimation of vl using the subset L. Thus, to find νL wehave:

E{(vl (t) − vl (t,L))2

}= E

*.,

∑i∈L

wli eli+/-

2

= E

*..,

∑i∈L

1σ2li

eli∑i∈L

1σ2li

+//-

2

=

∑i∈L

1σ4li

E{el2i

}

(∑i∈L

1σ2li

)2

=

∑i∈L

1σ2li(∑

i∈L1σ2li

)2 =1∑

i∈L1σ2li

. (56)

Therefore, we have νL = 1∑i∈L

1σ2li

. Clearly, a subset L with a

higher∑

i∈L1σ2li

has a lower cost.

While DL represents a security measure for a subset L,νL can be considered as an estimation performance metric.Thus, at each time step, ACV f must choose a subset Lwhich has the minimum SD as well as the minimum cost.Thus, we have to find the secure high performance subsetL∗(t) which is a subset that is secure and has the lowest

Page 10: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

10

estimation cost. To this end, a minimization problem thatfinds the secure high performance subset can be defined asL∗(t) = arg minL ξL (t), where, ξL (t) = νLςL (t). AlthoughνL is known for f , ςL (t) might change due to an attack on vltype sensors. Thus, we propose an MAB learning algorithmwhich learns the secure high performance subset L∗(t) [42],[43]. Our goal here is to choose a safe subset of sensors out ofall available sensors by efficiently exploring different subsetsand effectively exploiting the optimal subset, thus, we canapply an MAB framework to solve our problem.

In an MAB problem, a decision maker (ACV f ), pulls anarm from a set of available arms (selects a subset L from vltype sensors). Each arm generates a cost after being played,based on a distribution that is not known to the decision maker.The aim of ACV f is to minimize a cumulative cyber regret.This regret is defined as the difference between the reward ofthe best possible arm at each step, and the generated rewardof the arm that is played [42]–[44].

Let ξ∗(t) = ξL∗ (t) (t) be the lowest cost that could beachieved at time t from vl type sensors. Thus, the cyber regretfrom t = t0 up to a time t = te is defined as:

Rc (t0, te) , E

te∑t=t0

ξL (t) −te∑t=t0

ξ∗(t), (57)

where the expectation is taken over the random choices of theL. Cyber regret implies choosing non-optimal and non-securesubset of sensors. Thus, having a high cyber regret implieschoosing a subset of sensors with higher error and possiblyhigher injected data which can lead to an estimation error forvl (t). Since our proposed optimal controller which minimizesthe physical regret is a function of estimated value, vl (t), thus,higher cyber regret results in a higher estimation error and canintroduce physical regret to ACV f . This clearly shows thecyber-physical aspects of ACV security.

One of the most recognized methods for the MAB problemis to use the concept of UCB [42]. In this method, the MABalgorithm at each time t chooses L such that:

I(t) = arg maxL

−νL|L|

1nL

nL∑i=1

DL (tL,i) +

√2 ln tnL

(58)

where nL is the number of times that arm L was played beforeand tL,i is the time step when L is selected for i-th time. Also,note that nL → ∞, 1

nL

∑nLi=1 DL (tL,i) → E {DL (t)} [42], [43].

The UCB algorithm has an expected cumulative regret of:

E{Rc (t0, te)

}= 8

∑L |ξL<ξ∗

ln(te − t0 + 1)E

[ξL

]− E

[ξ∗

]+

(1 +

π2

3

) ∑L |ξL<ξ∗

E[ξL

]− E

[ξ∗

]. (59)

We use UCB algorithm because it has a logarithmic cyberregret and has been shown that there exists no algorithm thatcan have a better cyber regret [43]. Thus, using the UCBalgorithm and the defined cyber regret we can address the DIAdetection problem for sensors which lack a priori information.

VI. SIMULATION RESULTS AND ANALYSIS

For our simulations, we assume that ACV l has a sinusoidalspeed pattern and ACV f ’s initial speed and spacing fromthe ACV l are v f (0) = 90 km/h and d(0) = 100 m. We setn f = nd = nl = 4, T = 0.1 s, bf = 2.5 m/s2, and umax

f=

−uminf = 0.25 N/kg, which are chosen based on practical ACV

characteristics [13].

A. Optimal Safe Controller

Fig. 5a shows an ACV which uses the proposed optimalcontroller. From Fig. 5a, we make three observations: 1)The physical regret converges to zero and the actual spacingconverges to the optimal safe spacing after approximately 20seconds which shows that the proposed controller works prop-erly, 2) The speed of ACV f , v f (t), exhibits, approximately,a 5 seconds delay compared to the ACV l’s speed vehiclevl (t). This is because ACV f should first observe ACV l andthen control its own speed, and 3) The estimated states matchwith the actual state values which shows that applied stateestimation processes have an error close to zero.

Fig. 5b shows a comparison between our proposed con-troller with five other controllers: 1) A classical car followingcontroller called Gipps model [27] and [31], 2) a recent carfollowing model called the intelligent driver model (IDM)[30], 3) an optimal time-energy controller that jointly mini-mizes the travel time and ACV energy consumption [23], 4) arobust controller that minimizes the ACV energy consumptionwhile preventing the ACV from going to undesired states, i.e.avoids collision [25], and 5) a learning-based controller thatpredicts the worst case velocity scenario for the leading vehicleto avoid collisions [26]. From Fig. 5b, we can observe threekey points: 1) Our proposed controller can follow the speedof ACV l better compared to IDM and robust controllers aswe can see that IDM and robust controllers converge to analmost constant speed while our proposed controller can trackvl (t), 2) The proposed controller converges to the optimalsafe spacing. However, the IDM and optimal time-energycontrollers have a positive offset from the optimal safe spacingand the Gipps and learning-based controller always have asmaller spacing than the optimal safe spacing. This showsthat the Gipps model does not take into consideration thesafety as we can see from Fig. 5b that the spacing can admitsmall values close to 0 meters which increases the risk ofcollision. On the other hand, the IDM, learning-based, andoptimal time-energy controllers do not consider the optimalityof traffic flow as they always yield a divergence from theoptimal spacing. Also, the robust controller deviates aroundthe optimal safe spacing and never converges to the optimalsafe spacing, and 3) The cumulative physical regret for ourproposed optimal safe controller outperforms all the othercontrollers thus demonstrating that the proposed controller canjointly yield optimality and safety in ACV systems.

B. Robustness Against Physical Attacks

Next, we simulate an attacker which takes control of ACVl and after 100 seconds suddenly drops the speed of ACV

Page 11: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

11

65

70

75

80

85

90

95

-0.3

-0.2

-0.1

00.1

0.2

0.3

60

80

100

120

140

0 20 40 60 80 100 120 140 160 180

05

00

10

00

15

00

(a) The proposed controller, physical regret, speed andspacing.

50

60

70

80

90

100

020

40

60

80

100

120

0 20 40 60 80 100 120 140 160 180 200

02

46

107

150 160 170 180 190 2000

2

410

4

(b) Comparison of the Gipps, IDM, optimum time-energy, robust,and learning-based, and the proposed controllers.

Fig. 5: The proposed optimal controller’s effect on the physical regret.

-20

020

40

60

80

100

020

40

60

80

100

120

140

0 20 40 60 80 100 120 140 160 180 200

01

23

107

150 160 170 180 190 2000

5

10

104

Fig. 6: A comparison of the Gipps, IDM, optimumtime-energy, robust, learning-based, and the proposed

controllers when the ACV system is physically attacked.

l from 75 km/h to 5 km/h. Fig. 6 shows the effect of thisattack on the ACV f ’s speed, spacing and physical regret.Fig. 6 shows that our proposed controller always maintains theoptimal safe spacing even in presence of an attack. In contrast,the Gipps, IDM, optimum time-energy, robust, and learning-based controllers always have an offset from the optimal safespacing. Hence, our proposed controller is more robust againstphysical attacks. Note that, although Gipps can track the speedfaster than our proposed method, however, it does not optimizethe spacing. In addition, we can see from Fig. 6 that ourproposed controller has a physical regret closed to zero while

the cumulative regret of the other controllers grow linearlysince they have a constant offset from the optimal safe spacingin this attack scenario.

C. Cyber Security of ACV Systems

To illustrate the effectiveness of our proposed attack de-tection approaches, in Fig. 7, we analyze the impact of DIAon physical regret and estimation errors. In Corollary 1, westudy the maximum steady state physical regret induced byDIA on the sensors. The optimization problem in Corollary1 finds a relationship between the maximum steady statephysical regret and the DIA attack detection threshold. Fig.7a, which is derived by solving the optimization problem inCorollary 1, shows the relationship between the stealthy attackprobability and maximum DIA when we apply our proposedattack detection filter on the sensors with a priori information.Fig 7a shows that as the stealthy attack probability increasesthe domain of the attack reduces. Moreover the DIA hashigher value for the sensors with higher noise variance. To staystealthy with higher probability, the attacker must reduce thedomain of its injected faulty data to the sensors. In addition,Figs. 7b, 7c, and 7d show how the physical regret and steadystate errors are affected by the probability of staying stealthy.From these figures, we can see that as the stealthy attackprobability increases the regret and the absolute value of steadystate errors decrease because the attacker’s maximum DIAdecreases.

Fig. 8 shows the estimation error after applying our pro-posed attack detection filter on sensors with apriori estimation.From Fig. 8, we can see that, while the attack on v f (t) typesensor does not affect significantly the estimation, an attackon a d(t) type sensor can cause an estimation error on d(t).In addition, Fig. 8 shows that the designed attack detection

Page 12: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

12

mechanism for sensors with a priori information mitigatesDIAs and keeps the estimation error close to zero.

Fig. 9 shows how the increase on the number of under attackvl type sensors affects the cyber regret. In this simulation, weuse the same settings as for Fig. 5. From Fig. 9, we can firstobserve that irrespective of the number of under attack sensors,the attack can be detected in under 4 seconds as the cyberregret goes to zero after 4 second which is acceptable for thisscenario as in 4 seconds ACV f travels for 80 meters whilethe optimal safe spacing is also 80 meters, thus, there will beno collision even if ACV l stops suddenly , while ACV f ’ssensors are attacked. Moreover, the growth rate of cyber regretfor the three attacks is approximately equal regardless of thenumber of attacked sensors. Furthermore, Fig. 9 shows that,as the number of attacked sensors increases, the cumulativecyber regret increases faster thus the attack can be detectedearlier. This shows that having access to more sensors is notessentially beneficial for the attacker and thus, the attackermust also optimize its set of attacked sensors to stay stealthyfor a longer time.

Fig. 10 shows how the domain of attack can affect the cyberregret. We can see from Fig. 10 that, as the domain of injecteddata to the vl type sensor increases, the MAB algorithm canchooses the optimal subset more efficiently. This means thatinjecting higher values to the sensor increases the chance ofdetectability and thus, reduces the impact of the attack onthe sensors. In addition, we can see that the cumulative cyberregret for the highest DIA (2.5 m/s data injection) is greaterthan other cases, at the beginning of the simulation. However,after almost 350 seconds, the cumulative cyber regret of thehighest DIA becomes the least compared to the other DIAs.This is due to the fact that a larger DIA leads to a higher cyberregret at the beginning of the attack which forces the MABalgorithm to detect it earlier than other cases.

Fig. 11 shows a scenario in which the attacker attacks onlythe third sensor when nl = 4. Note that, b4b3b2b1 shows thestate of the sensors such that for i = 1, . . . , 4 if bi = 1 thensensor i is under attack and otherwise, it is not. Obviously,when the attacker attacks the third sensor, the ACV mustonly rely on the other sensors. The b4b3b2b1 on the x-axisindicates the chosen subset by the MAB algorithm to usefor estimation. Fig. 11 shows the average number of timesover 1000 simulations that a subset is chosen using the MABalgorithm. We can see from Fig. 11 that using the MABalgorithm, percentage of the times that subset {1011} is chosenby MAB is higher than other cases, which means that theMAB algorithm can detect the attack on the vl type sensors.In addition, after subset {1011}, suboptimal subsets such as{0011}, {1001}, and {1010} are chosen more frequently thanthe other subsets.

Finally, to show that our proposed DIA detection approachescan reduce the physical regret significantly, in Fig. 12, weconsider that the attacker attacks only one sensor of eachtype with the lowest noise error variance (most valuablesensor) after 1000 seconds (after ensuring that Kalman filterconverged). Then, we compare the cumulative physical regretof the ACV system for a case in which the ACV only usesa Kalman filter for estimation and another case in which

the ACV applies both of our proposed DIA detection filters.Moreover, in this attack, we consider that the attacker increasesthe sensor readings by a factor of 50%. As can be seen fromFig. 12, the proposed attack detection filter has lower physicalregret compared to a simple Kalman filtering. We can see thatwhile the physical regret increases linearly for Kalman filter,our proposed DIA detection approach keeps the physical regretclose to zero. This shows that, our proposed DIA detectionapproach can successfully detect the attack, mitigate it, andkeep the ACV system safe and secure, however the Kalmanfilter fails to stay robust against the DIA.

VII. CONCLUSION

In this paper, we have comprehensively studied the cyber-physical security and safety of ACV systems. We have pro-posed an optimal safe controller that maximizes the traffic flowand minimizes the risk of accidents on the roads by optimizingthe speed and spacing of ACVs. In addition, the proposedoptimal safe controller maximizes the stability of ACV sys-tems and improves their robustness against physical attacks.Moreover, we have improved cyber-physical security of ACVsystems by proposing two novel DIA detection approaches.The first approach detects DIAs using a priori informationabout the sensors. In the second approach, we have proposedan MAB algorithm to learn secure subset of sensors when thereexists no a priori information about sensors. Simulation resultshave shown that the proposed optimal safe controller yieldsbetter stability and robustness compared to current state ofthe art controllers. In addition, the DIA detection approachesimprove the security of the sensors and robustness of the ACVsystem compared to Kalman filtering.

APPENDIX

A. Proof of Theorem 1

First, using (1) and (24) we can find that:

max{−v f (t)

T, umin

f , u f (t − 1) − ∆u}≤ u f (t)

≤ min{v − v f (t)

T, umax

f , u f (t − 1) + ∆u}, (60)

and since ACV f only has an estimation of v f (t), then wecan rewrite (60) as follows:

max{−v f (t)

T, umin

f , u f (t − 1) − ∆u}

︸ ︷︷ ︸u1 (t)

≤ u f (t)

≤ min{v − v f (t)

T, umax

f , u f (t − 1) + ∆u}

︸ ︷︷ ︸u1 (t)

. (61)

Now, we can define Lagrangian multipliers and apply theKarush Kuhn Tucker (KKT) conditions to solve the optimiza-tion problem in (22). Then, we will have:

g(u f (t), λ1, λ2) =( (v f (t) + Tu f (t))2

2bf− d(t) − T vl (t)

+ T v f (t))2+ λ1(u f (t) − u1(t)

Page 13: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

13

(a) Attack detection filterapplied on the cyber attacks on

v f (t) and d(t).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

2

4

6

8

10

12

14

16

18

10-3

(b) Attack detection filterapplied on the cyber attacks on

v f (t) and d(t).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-2

-1.5

-1

-0.5

0

0.5

1

1.510

-8

(c) Attack detection filterapplied on the cyber attacks on

v f (t) and d(t).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

(d) Attack detection filterapplied on the cyber attacks on

v f (t) and d(t).

Fig. 7: Attack detection filter applied on the cyber attacks on v f (t) and d(t).

Fig. 8: Attack detection filter applied on the cyber attacks on v f (t) and d(t).

0 50 100 150 200 250 300 350 400 450 500

0

5

10

15

20

25

0 50 100 150 200 250 300 350 400 450 500

0

20

40

60

80

100

120

140

160

0 5 10-5

0

5

10

15

20

25

30

Fig. 9: The cyber regret in MAB algorithm applied to vltype sensors.

+ λ2(u1(t) − u f (t)), (62)

with the first order necessary conditions given by: ∂g∂u f

=

0, ∂g∂λ1= 0, and ∂g

∂λ2= 0. When λ1 and λ2 are not active

(λ1 = 0 and λ2 = 0), we will have:

∂g

∂u f=

4T2bf

*,

(v f (t) + Tu f (t))2

2bf− d(t) − T vl (t) + T v f (t)+

(v f (t) + Tu f (t)

)= 0

0 50 100 150 200 250 300 350 400 450 500

0

10

20

30

40

50

60

70

80

90

100

0 50 100 150 200 250 300 350 400 450 500

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Fig. 10: The best subset selection in MAB algorithm appliedto vl type sensors.

⇒ u∗f (t) =1T

√2bf

(d(t) + T vl (t) − T v f (t)

)− v f (t)

)−

1Tv f (t). (63)

However, 1T

(−

√2bf

(d(t) + T vl (t) − T v f (t)

)− v f (t)

)and

− 1T v f (t) are not acceptable solutions since they result in

v f (t+1) ≤ 0. The next step is to show that the third solution in(63) is the global minimum. Thus, we apply the second-order

Page 14: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

14

Fig. 11: Attack detection with MAB algorithm applied to vltype sensors.

condition on R(t) as follows:

d2 R(t)du2

f(t)=

4T2bf

[(T (v f (t) + Tu f (t))2

bf

)+ T

((v f (t) + Tu f (t))2

2bf

− d(t) − T vl (t) + T v f (t))]

d2 R(t)du2

f(t)

�����u∗f

(t)=

4T2bf

*,

T (v f (t) + Tu∗f (t))2

bf

+-> 0. (64)

Thus, u∗f (t) = 1T

(√2bf

(d(t) + T vl (t) − T v f (t)

)− v f (t)

)is a

global maximizer of R(t). Now, to consider the constraints in(60), if we activate λ1 or λ2 the first order condition will resultin having u∗f (t) = u1(t) or u1(t). Therefore, the optimal 1-stepahead controller can be given by (26).

B. Proof of Theorem 3

We know that the Kalman gain converges to K . Thus, from(14) and (17) we can find the state estimation x(t) as a processwhich depends on estimation error r (t) as follows:

x(t) =[I − KH

]x (−) (t) + K z(t)︸︷︷︸

[Hx(t)+e(t)]

=[I − KH

] [Ax(t − 1) + Bu f (t − 1)

+ F vl (t − 1)]+ KH [x(t) + r (t)] + K e(t),

⇒[I − KH

]x(t) =

[I − KH

] [Ax(t − 1) + Bu f (t − 1)

+ F vl (t − 1)]+ K [Hr (t) + e(t)] ,

⇒ x(t) = Ax(t − 1) + Bu f (t − 1) + F vl (t − 1)

+[I − KH

]−1K [Hr (t) + e(t)] . (65)

Now, by subtracting (65) from the system model defined in(2), we will have:

r (t) = Ar (t − 1) + Frl (t) −[I − KH

]−1K [Hr (t) + e(t)] ,

r (t) =[I +

[I − KH

]−1KH

]−1

︸ ︷︷ ︸Q

Ar (t − 1)

+

[I +

[I − KH

]−1KH

]−1[Frl (t) − K e(t)

]

︸ ︷︷ ︸ρ(t)

Moreover, we can see that ρ(t) is a linear combination ofindependent white Gaussian process with zero mean. To findthe covariance matrix Cρ of ρ(t) we will have:

Cρ = E{ρ(t)ρT (t)

}= E

{ [QFrl (t) − QK e(t)

]

×[rTl (t)FTQT − eT (t)KT

QT] }

= E

{QFrl (t)rTl (t)FTQT − QK e(t)rTl (t)︸ ︷︷ ︸

0

FTQT

− QF rl (t)eT (t)︸ ︷︷ ︸0

KTQT + QK e(t)eT (t)KT

QT

}= QFσ2

l FTQT + QKRK

TQT . (66)

Note that the fact that e(t) and rl (t) are independent leads tohaving e(t)rT

l(t) = 0 . Next, we write the dynamic model for

the estimation error as follows:

r (t) = QAr (t − 1) + ρ(t) =[QA

] t−t0 r (tc)

+

t−t0−1∑k=0

[QA

] t−t0−k−1 ρ(tc + k), (67)

where t0 is the time step where the Kalman filter converges.Thus, for t � t0 and if QA is asymptotically stable then wewill have r (t) as a Gaussian process due to the central limittheorem. Now, to find the mean of r (t) we can write:

E {r (t)} = QAE{r (t − 1)

}+ E {ρ(t)} = QAE {r (t)} ,

⇒ E {r (t)} = 0. (68)

Next, since the mean is zero, then the covariance matrix ofr (t), Cr = E{r (t)rT (t)}. To find Cr , we have:

E{r (t)rT (t)

}= E

{ [QAr (t − 1) + ρ(t)

[r (t − 1)ATQT + ρT (t)

] }= E

{QAr (t − 1)rT (t − 1)ATQT

+ QA r (t − 1)ρT (t)︸ ︷︷ ︸0

+ ρ(t)rT (t − 1)︸ ︷︷ ︸0

ATQT

+ ρ(t)ρT (t)}.

Note that since r (t − 1) and ρ(t) are independent ρ(t)rT (t −1) = 0. In addition, we know that after convergence we willhave E

{r (t)rT (t)

}= E

{r (t − 1)rT (t − 1)

}= Cr then, we

will need to solve the following discrete Ricatti equation Cr =

QACrATQT + Cρ .

Page 15: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

15

950 1000 1050 1100

00

.05

0.1

0.1

50

.20

.25

0.3

0.3

50

.40

.45

(a) DIA on v f (t) type.

950 1000 1050 1100

020

40

60

80

100

120

140

160

180

(b) DIA on d(t) type.

950 1000 1050 1100

01

23

45

67

8

105

(c) DIA on vl (t) type.

Fig. 12: Comparison of the proposed DIA detection approaches with Kalman filtering.

Now, the final step is to find the covariance matrix of theresidual Cµ. Since after Kalman filter convergence we willhave x (−) (t) = x(t) then:

z(t) − z (−) (t) = Hx(t) + e(t) − Hx (−) (t) = Hx(t) + e(t)− Hx(t) = Hr (t) + e(t). (69)

Thus, we will have:

E{µ(t)µT (t)

}= E

{[Hr (t) + e(t)]

[rT (t)HT + eT (t)

] }

= E

{Hr (t)rT (t)HT + e(t)rT (t)HT

+ Hr (t)eT (t) + e(t)eT (t)}

= HCrHT + R + E

{e(t)rT (t)HT

+ Hr (t)eT (t)}. (70)

Now, we have:

E{e(t)rT (t)HT

}= E

{e(t)rT (t − 1)︸ ︷︷ ︸

0

ATQTHT

+ e(t)(rl (t)︸ ︷︷ ︸

0

FT − eT (t)KT )QTHT

}

= −RKTQTHT ,

E{Hr (t)eT (t)

}= E

{HQA r (t − 1)eT (t)︸ ︷︷ ︸

0

+ HQ(− K e(t) +Frl (t)

)eT (t)︸ ︷︷ ︸

0

}= −HQKR.

Thus, the covariance matrix can be given by (44).

C. Proof of Theorem 4

From (17), after Kalman filter convergence we obtain:

x(t + 1) =[I − KH

]x (−) (t) + K [Hx(t + 1) + e(t)]

=[I − KH

] [Ax(t) + Bu f (t) + F vl (t)

]

+K[H

[Ax(t) + Bu f (t) + Fvl (t)

]+ e(t)

]

=Ax(t) + Bu f (t) +[I − KH

]F vl (t)

+KHFvl (t) + K [HAr (t) + e(t)] . (71)

Due to the definition ofF =[

0 T]T

,[I − KH

]F and

KHF are vectors with two elements where the first elementsare zero. We define two parameters s1 and s2 as

[I − KH

]F =[

0s1

]and KHF =

[0s2

], and we will have s1 + s2 =

T . Then using (53) and (71) the a posteriori estimation willbe v (+)

l(t) = s1

T vl (t) +s2T vl (t) +

[0 1

]K︸ ︷︷ ︸

J

[HAr (t) + e(t)] .

Thus, we will have:

µ(+)l

(t) = hlv(+)l

(t) − hlvl (t) − el (t) = hl

[ s1T

(vl (t) − vl (t))]

− el (t) + hlJ [HAr (t) + e(t)]

= −hl

s1T

nl∑i=1

wli eli− el (t)︸ ︷︷ ︸

Υel (t)

+hlJ [HAr (t) + e(t)] .

(72)

First, we derive the mean of a posteriori residual:

E{µ(+)l

(t)}= −hl

s1T

nl∑i=1

wli E{eli

}− E {el (t)}

+ hlJ [HAE {r (t)} + E {e(t)}] = 0. (73)

Since the mean is zero the covariance matrix of µ(+)l

(t), Cµl ,can be derived as follows:

Cµl =E{µ(+)l

(t)µ(+)l

T(t)

}=E

{[Υ el (t) + hlJ [HAr (t) + e(t)]]

×[eTl (t)ΥT +

[rT (t)ATHT + eT (t)

]JT hTl

] }=E

{Υ el (t)eTl (t)ΥT +Υ el (t)

[rT (t)ATHT + eT (t)

]︸ ︷︷ ︸0

Page 16: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

16

+hlJ [HAr (t) + e(t)] eTl (t)︸ ︷︷ ︸0

ΥT + hlJ

[HAr (t)rT (t)ATHT

+e(t)rT (t)ATHT + HAr (t)eT (t) + e(t)eT (t)]JT hTl

}=ΥRlΥ

T + hlJ[HACrA

THT − RKTQT ATHT

−HAQKR + R]JT hTl . (74)

REFERENCES

[1] A. Ferdowsi, U. Challita, and W. Saad, “Deep learning for reliable mo-bile edge analytics in intelligent transportation systems,” IEEE VehicularTechnology Magazine, Accepted and to Appear, 2018.

[2] M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, “Unmanned aerialvehicle with underlaid device-to-device communications: Performanceand tradeoffs,” IEEE Transactions on Wireless Communications, vol. 15,no. 6, pp. 3949–3963, June 2016.

[3] T. Zeng, O. Semiari, W. Saad, and M. Bennis, “Joint communicationand control for wireless autonomous vehicular platoon systems,” arXivpreprint arXiv:1804.05290, 2018.

[4] U. Challita, A. Ferdowsi, M. Chen, and W. Saad, “Artificial intelligencefor wireless connectivity and security of cellular-connected UAVs,” IEEEWireless Communications Magazine, Accepted and to Appear, 2018.

[5] M. Amoozadeh, A. Raghuramu, C. n. Chuah, D. Ghosal, H. M. Zhang,J. Rowe, and K. Levitt, “Security vulnerabilities of connected vehiclestreams and their impact on cooperative driving,” IEEE CommunicationsMagazine, vol. 53, no. 6, pp. 126–132, June 2015.

[6] I. Parvez, A. Rahmati, I. Guvenc, A. I. Sarwat, and H. Dai, “A surveyon low latency towards 5g: Ran, core network and caching solutions,”IEEE Communications Surveys Tutorials, pp. 1–1, May 2018.

[7] M. Husak, N. Neshenko, M. Safaei Pour, E. Bou-Harb, and P. Celeda,“Assessing internet-wide cyber situational awareness of critical sectors,”in Proceedings of the 13th International Conference on Availability,Reliability and Security. Hamburg, Germany: ACM, August 2018,pp. 29:1–29:6.

[8] A. Ferdowsi, W. Saad, B. Maham, and N. B. Mandayam, “A ColonelBlotto game for interdependence-aware cyber-physical systems securityin smart cities,” in Proceedings of the 2Nd International Workshopon Science of Smart City Operations and Platforms Engineering, ser.SCOPE ’17. New York, NY, USA: ACM, 2017, pp. 7–12.

[9] F. Kargl, P. Papadimitratos, L. Buttyan, M. Mter, E. Schoch, B. Wieder-sheim, T. V. Thong, G. Calandriello, A. Held, A. Kung, and J. P.Hubaux, “Secure vehicular communication systems: implementation,performance, and research challenges,” IEEE Communications Maga-zine, vol. 46, no. 11, pp. 110–118, November 2008.

[10] A. Greenberg. Hackers remotely kill a jeep on thehighway. [Online]. Available: https://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/

[11] S. Woo, H. J. Jo, and D. H. Lee, “A practical wireless attack onthe connected car and security protocol for in-vehicle can,” IEEETransactions on Intelligent Transportation Systems, vol. 16, no. 2, April2015.

[12] S. N. Narayanan, S. Mittal, and A. Joshi, “OBD securealert: Ananomaly detection system for vehicles,” in Proc. of IEEE InternationalConference on Smart Computing (SMARTCOMP), May 2016, pp. 1–6.

[13] G. Calandriello, P. Papadimitratos, J. P. Hubaux, and A. Lioy, “Onthe performance of secure vehicular communication systems,” IEEETransactions on Dependable and Secure Computing, vol. 8, no. 6, pp.898–912, Nov 2011.

[14] T. Kim, A. Studer, R. Dubey, X. Zhang, A. Perrig, F. Bai, B. Bellur,and A. Iyer, “Vanet alert endorsement using multi-source filters,” inProceedings of the Seventh ACM International Workshop on VehiculArInterNETworking, Chicago, IL, USA, September 2010, pp. 51–60.

[15] M. Sun, M. Li, and R. Gerdes, “A data trust framework for vanetsenabling false data detection and secure vehicle tracking,” in Proc. ofIEEE Conference on Communications and Network Security (CNS), LasVegas, NV, USA, Oct 2017, pp. 1–9.

[16] M. E. Eltayeb, J. Choi, T. Y. Al-Naffouri, and R. W. Heath, “Enhancingsecrecy with multiantenna transmission in millimeter wave vehicularcommunication systems,” IEEE Transactions on Vehicular Technology,vol. 66, no. 9, pp. 8139–8151, Sept 2017.

[17] A. Petrillo, A. Pescap, and S. Santini, “A collaborative approach forimproving the security of vehicular scenarios: The case of platooning,”Computer Communications, vol. 122, pp. 59 – 75, 2018.

[18] A. Ferdowsi and W. Saad, “Deep learning for signal authentication andsecurity in massive Internet of Things systems,” IEEE Transactions onCommunications, October 2018.

[19] S. Tuohy, M. Glavin, C. Hughes, E. Jones, M. Trivedi, and L. Kilmartin,“Intra-vehicle networks: A review,” IEEE Transactions on IntelligentTransportation Systems, vol. 16, no. 2, pp. 534–545, April 2015.

[20] A. Burg, A. Chattopadhyay, and K. Lam, “Wireless communication andsecurity issues for cyberphysical systems and the internet-of-things,”Proceedings of the IEEE, vol. 106, no. 1, pp. 38–60, Jan 2018.

[21] D. Xu, P. Ren, Q. Du, and L. Sun, “Hybrid secure beamformingand vehicle selection using hierarchical agglomerative clustering for c-ran-based vehicle-to-infrastructure communications in vehicular cyber-physical systems,” International Journal of Distributed Sensor Networks,vol. 12, no. 8, p. 1550147716662783, 2016.

[22] P. Kleberger, T. Olovsson, and E. Jonsson, “Security aspects of thein-vehicle network in the connected car,” in Proc. of IEEE IntelligentVehicles Symposium (IV), Baden-Baden, Germany, June 2011, pp. 528–533.

[23] J. M. Bradley and E. M. Atkins, “Optimization and control of cyber-physical vehicle systems,” Sensors, vol. 15, no. 9, pp. 23 020–23 049,September 2015.

[24] M. Xue, W. Wang, and S. Roy, “Security concepts for the dynamics ofautonomous vehicle networks,” Automatica, vol. 50, no. 3, pp. 852 –857, 2014.

[25] S. Sadraddini, S. Sivaranjani, V. Gupta, and C. Belta, “Provably safecruise control of vehicular platoons,” IEEE Control Systems Letters,vol. 1, no. 2, pp. 262–267, Oct 2017.

[26] S. Lefvre, A. Carvalho, and F. Borrelli, “A learning-based frameworkfor velocity control in autonomous driving,” IEEE Transactions onAutomation Science and Engineering, vol. 13, no. 1, pp. 32–42, Jan2016.

[27] A. Ferdowsi, U. Challita, W. Saad, and N. B. Mandayam, “Robust deepreinforcement learning for security and safety in autonomous vehiclesystems,” in Proc. of IEEE International Conference on IntelligentTransportation Systems, Maui, HI, November 2018.

[28] E. Schoitsch, C. Schmittner, Z. Ma, and T. Gruber, “The need for safetyand cyber-security co-engineering and standardization for highly auto-mated automotive vehicles,” in Advanced Microsystems for AutomotiveApplications 2015, T. Schulze, B. Muller, and G. Meyer, Eds. Cham:Springer International Publishing, 2016, pp. 251–261.

[29] R. L. Williams, D. A. Lawrence et al., Linear state-space controlsystems. John Wiley & Sons, 2007.

[30] A. Kesting, M. Treiber, and D. Helbing, “Enhanced intelligent drivermodel to access the impact of driving strategies on traffic capacity,”Philosophical Transactions of the Royal Society of London A: Math-ematical, Physical and Engineering Sciences, vol. 368, no. 1928, pp.4585–4605, 2010.

[31] P. G. Gipps, “A behavioural car-following model for computer simula-tion,” Transportation Research Part B: Methodological, vol. 15, no. 2,pp. 105–111, 1981.

[32] H. K. Khalil, Nonlinear systems. Upper Saddle River, NJ : Prentice-Hall, 2002.

[33] T. Kailath, Linear systems. Prentice-Hall Englewood Cliffs, NJ, 1980,vol. 156.

[34] M. Moayedi, Y. C. Soh, and Y. K. Foo, “Optimal kalman filtering withrandom sensor delays, packet dropouts and missing measurements,” in2009 American Control Conference, vol. St. Louis, MO, USA, June2009, pp. 3405–3410.

[35] S. J. Julier and J. K. Uhlmann, “Fusion of time delayed measurementswith uncertain time delays,” in Proceedings of the 2005, AmericanControl Conference, 2005., Portland, OR, USA, USA, June 2005.

[36] Y. Tian and L. Pan, “Predicting short-term traffic flow by long short-termmemory recurrent neural network,” in Proc. IEEE International Confer-ence on Smart City/SocialCom/SustainCom (SmartCity), Dec 2015, pp.153–158.

[37] X. Ma, Z. Tao, Y. Wang, H. Yu, and Y. Wang, “Long short-term memoryneural network for traffic speed prediction using remote microwavesensor data,” Transportation Research Part C: Emerging Technologies,vol. 54, pp. 187 – 197, 2015.

[38] S. Lefvre, C. Sun, R. Bajcsy, and C. Laugier, “Comparison of parametricand non-parametric approaches for vehicle speed prediction,” in 2014American Control Conference, Portland, OR, USA, June 2014, pp. 3494–3499.

Page 17: Cyber-Physical Security and Safety of Autonomous ... - Oulu

0090-6778 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCOMM.2019.2927570, IEEETransactions on Communications

17

[39] A. Taleb Zadeh Kasgari and W. Saad, “Stochastic optimization andcontrol framework for 5G network slicing with effective isolation,” inProc. 52nd Annual Conference on Information Sciences and Systems(CISS) (CISS 2018), Princeton, NJ, USA, Mar. 2018.

[40] Q. Yang, L. Chang, and W. Yu, “On false data injection attacks againstKalman filtering in power system dynamic state estimation,” Securityand Communication Networks, vol. 9, no. 9, pp. 833–849, August 2013.

[41] K. B. Petersen and M. S. Pedersen, The Matrix Cookbook. TechnicalUniversity of Denmark, 2012.

[42] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of themultiarmed bandit problem,” Machine Learning, vol. 47, no. 2, pp. 235–256, May 2002.

[43] R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma, “Regret bounds forsleeping experts and bandits,” Machine Learning, vol. 80, no. 2, pp.245–272, Sep 2010.

[44] S. Ali, A. Ferdowsi, W. Saad, N. Rajatheva, and J. Haapola, “Sleepingmulti-armed bandit learning for fast uplink grant allocation in machinetype communications,” arXiv preprint arXiv:1810.12983, 2018.

Aidin Ferdowsi received his BS in Electrical Engi-neering from the University of Tehran, Iran and MSin Electrical Engineering from Virginia Tech. He iscurrently a PhD student at the Bradley departmentof Electrical and Computer Engineering at VirginiaTech. He is also a Wireless@VT Fellow. His re-search interests include machine learning, cyber-physical systems, smart cities, security, and gametheory.

Samad Ali Samad Ali received his B.S. in electricalengineering from the University of Tabriz, Iran, andhis M.S. in wireless communications engineeringfrom the University of Oulu, Finland. He is currentlya Ph.D. student at the Center for Wireless Communi-cations at the University of Oulu. His research inter-ests include wireless communications with a focuson machine type communications, data analytics inthe internet-of-things, and machine learning

Walid Saad (S’07, M’10, SM’15) received hisPh.D degree from the University of Oslo in 2010.Currently, he is an Associate Professor at the De-partment of Electrical and Computer Engineeringat Virginia Tech, where he leads the Network Sci-ence, Wireless, and Security (NetSciWiS) labora-tory, within the Wireless@VT research group. Hisresearch interests lie at the intersection of wirelessnetworks, game theory, machine learning, cyberse-curity, unmanned aerial vehicles, and cyber-physicalsystems. He is currently leading 15 externally funded

grants (including 13 NSF awards) on various areas that range from resourcemanagement in wireless cellular, Internet of Things, and millimeter wavenetworks to drone-based wireless communications, cyber-physical systemssecurity, big data analytics for smart cities, and resilience of critical infras-trcture.

Dr. Saad is the recipient of the NSF CAREER award in 2013, the AFOSRsummer faculty fellowship in 2014, and the Young Investigator Award fromthe Office of Naval Research (ONR) in 2015. He was the author/co-author ofsix conference best paper awards at WiOpt in 2009, ICIMP in 2010, IEEEWCNC in 2012, IEEE PIMRC in 2015, IEEE SmartGridComm in 2015, andEuCNC in 2017. He is the recipient of the 2015 Fred W. Ellersick Prizefrom the IEEE Communications Society. In 2015, Dr. Saad was named theStephen O. Lane Junior Faculty Fellow at Virginia Tech and, in 2017, he wasnamed College of Engineering Faculty Fellow. He currently serves as an editorfor the IEEE Transactions on Wireless Communications, IEEE Transactionson Communications, IEEE Transactions on Mobile Computing, and IEEETransactions on Information Forensics and Security.

Narayan B Mandayam (S’89-M’94-SM’99-F’09)received the B.Tech (Hons.) degree in 1989 fromthe Indian Institute of Technology, Kharagpur, andthe M.S. and Ph.D. degrees in 1991 and 1994 fromRice University, all in electrical engineering. Since1994 he has been at Rutgers University where he iscurrently a Distinguished Professor and Chair of theElectrical and Computer Engineering department.He also serves as Associate Director at WINLAB.He was a visiting faculty fellow in the Department ofElectrical Engineering, Princeton University, in 2002

and a visiting faculty at the Indian Institute of Science, Bangalore, India in2003. His recent interests also include privacy in IoT, resilient smart citiesas well as modeling and analysis of trustworthy knowledge creation on theinternet. Dr. Mandayam is a co-recipient of the 2015 IEEE CommunicationsSociety Advances in Communications Award for his seminal work on powercontrol and pricing, the 2014 IEEE Donald G. Fink Award for his IEEEProceedings paper titled Frontiers of Wireless and Mobile Communicationsand the 2009 Fred W. Ellersick Prize from the IEEE Communications Societyfor his work on dynamic spectrum access models and spectrum policy.He is also a recipient of the Peter D. Cherasia Faculty Scholar Awardfrom Rutgers University (2010), the National Science Foundation CAREERAward (1998) and the Institute Silver Medal from the Indian Institute ofTechnology (1989). He is a coauthor of the books: Principles of CognitiveRadio (Cambridge University Press, 2012) and Wireless Networks: MultiuserDetection in Cross-Layer Design (Springer, 2004). He has served as anEditor for the journals IEEE Communication Letters and IEEE Transactionson Wireless Communications. He has also served as a guest editor of theIEEE JSAC Special Issues on Adaptive, Spectrum Agile and Cognitive RadioNetworks (2007) and Game Theory in Communication Systems (2008). Heis a Fellow and Distinguished Lecturer of the IEEE.