Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin

49
Optimal Defense Against Jamming Attacks in Cognitive Radio Networks Using the Markov Decision Process Approach Presenter: Wayne Hsiao Advisor: Frank, Yeong-Sung Lin Yongle Wu, Beibei Wang, and K. J. Ray Liu

description

Optimal Defense Against Jamming Attacks in Cognitive Radio Networks Using the Markov Decision Process Approach. Yongle Wu, Beibei Wang, and K. J. Ray Liu . Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin . Agenda. Introduction Related Works System Model - PowerPoint PPT Presentation

Transcript of Presenter: Wayne Hsiao Advisor: Frank , Yeong -Sung Lin

Page 1: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Defense Against Jamming Attacks in

Cognitive Radio Networks Using the Markov

Decision Process Approach

Presenter: Wayne HsiaoAdvisor: Frank, Yeong-Sung Lin

Yongle Wu, Beibei Wang, and K. J. Ray Liu

Page 2: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 3: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 4: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• Cognitive radio technology has been

receiving a growing attention• In a cognitive radio network

o Unlicensed users (secondary users)o Spectrum holders (primary users)

• Secondary users usually compete for limited spectrum resources o Game theory has been widely applied as a flexible and

proper tool to model and analyze their behavior in the network

Page 5: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• Cognitive radio networks are vulnerable

to malicious attacks• Security countermeasures

o Crucial to the successful deployment of cognitive radio networks

• We mainly focus on the jamming attacko One of the major threats to cognitive radio networkso Several malicious attackers intend to interrupt the

communications of secondary users by injecting interference

Page 6: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• Secondary user could hop across

multiple bands in order to reduce the probability of being jammedo Optimal defense strategy o Markov decision process (MDP)

• The optimal strategy strikes a balance between the cost associated with hopping and the damage caused by attackers

Page 7: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Introduction• In order to determine the optimal

strategy, the secondary user needs to know some information o the number of attackers

• Maximum Likelihood Estimation (MLE) o A learning process in this paper that the secondary user

estimates the useful parameters based on past observations

Page 8: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 9: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Related Works• The problem becomes more

complicated in a cognitive radio network o Primary users’ access has to be taken into consideration

• We consider the scenario o A single-radio secondary usero Defense strategy is to hop across different bands

Page 10: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 11: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• A secondary user opportunistically

accesses one of the predefined M licensed bands

• Each licensed band is time-slotted • The access pattern of primary users can

be characterized by an ON-OFF model

Page 12: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model

• Assume all bands share the same channel model and parameters

• But different bands are used by independent primary users

Page 13: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• Secondary user has to detect the

presence of the primary user at the beginning of each time slot

Page 14: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• Communication gain R

o When the primary user is absent in that band• The cost associated with hopping is C• We assume there are m (m ≥ 1)

malicious single-radio attackers• Attackers do not want to interfere with

primary users o Because primary users’ usage of spectrum is enforced

by their ownership of bands

Page 15: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• On finding the secondary user

o Attacker will immediately inject jamming power which makes the secondary user fail to decode data packets

• We assume that the secondary user suffers from a significant loss L when jammed

• When all the attackers coordinate to maximize the damageo they detect m channels in a time slot

Page 16: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• The longer the secondary user stays in

a band, the higher risk to be exposed to attackers

• At the end of each time slot the secondary user decides o to stay o to hop

• The secondary user receives an immediate payoff U(n) in the nth time slot

Page 17: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model

• 1(.) is an indicator functiono Returning 1 when the statement in the parenthesis holds

true o 0 otherwise

Page 18: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

System Model• Average Payoff Ū

o The secondary user wants to maximize o Malicious attackers want to minimize

• The discount factor δ (0 < δ < 1) measures how much the secondary user values a future payoff over the current one

Page 19: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 20: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• Attack strategyo Attackers coordinately tune their radios randomly to m

undetected bands in each time slot o When either all bands have been sensed or the

secondary user has been found and jammed

• The jamming game can be reduced to a Markov decision process o We first show how to model the scenario as an MDP o Then solve it using standard approaches

Page 21: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• At the end of the nth time slot o The secondary user observes the state of the current

time slot S(n) o And chooses an action a(n)

• Whether to tune the radio to a new band or not, which takes effect at the beginning of the next time slot

• S(n) = P o The primary user occupied the band in the nth time slot

• S(n) = J o The secondary user was jammed in the nth time slot

Page 22: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• a(n) = h o The secondary user to hop to a new band

• The secondary user has transmitted a packet successfully in the time slot o ‘to hop’ (a(n) = h) o ‘to stay’ (a(n) = s)

• S(n) = K o This is the Kth consecutive slot with successful

transmission in the same band

Page 23: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• The immediate payoff depends on both the state and the action

• p(S’|S, h) o The transition probability from an old state S to a new

state S’ when taking the action h • p(S’|S, s)

o The transition probability from an old state S to a new state S’ when taking the action s

Page 24: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• If the secondary user hops to a new band, transition probabilities do not depend on the old state

• The only possible new states are o P (the new band is occupied by the primary user) o J (transmission in the new band is detected by an

attacker) o 1 (successful transmission begins in the new band)

Page 25: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• When the total number of bands M is large o M ≫ 1

• Assume that the probability of primary user’s presence in the new band equal the steady-state probability of the ON-OFF model o Neglecting the case that the secondary user hops back

to some band in very short time,

Page 26: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• The secondary user will be jammed with the probability m/M o Each attacker detects one band without overlapping

• Transition probabilities are

Page 27: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• Note that s is not a feasible action when the state is in J or P

• At state K, only max(M−Km,0) bands have not been detected by attackers o But another m bands will be detected in the upcoming

time slot o The probability of jamming conditioned on the absence

of primary user

Page 28: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Optimal Strategy with Perfect Knowledge

• To sum up, transition probabilities associated with the action s are as follows: ∀K ∈ {1,2,3,...}

Page 29: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 30: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• If the secondary user stays in the same band for too long, he/she will eventually be found by an attacker o p(K + 1|K,s) = 0 if K > M/m − 1

• Therefore, we can limit the state S to a finite set , where

Page 31: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• An MDP consists of four important components o a finite set of states o a finite set of actions o transition probabilities o immediate payoffs

• The optimal defense strategy can be obtained by solving the MDP

Page 32: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• A policy is defined as a mapping from a state to an action o π : S(n) → a(n)

• A policy π specifies an action π(S) to take whenever the user is in state S

• Among all possible policies, the optimal policy is the one that maximizes the expected discounted payoff

Page 33: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• The value of a state S is defined as the highest expected payoff given the MDP starts from state S

• The optimal policy is the optimal defense strategy that the secondary user should adopt since it maximizes the expected payoff

Page 34: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• After a first move the remaining part of an optimal policy should still be optimal

• The first move should maximize the sum of immediate payoff and expected payoff conditioned on the current actiono Bellman equation

Page 35: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Markov Decision Process

• Critical state K*(K∗ ≤ )

• K∗ can be obtained from solving the MDP, and the optimal strategy becomes

Page 36: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 37: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• A learning schemeo Maximum Likelihood Estimation (MLE)

• The secondary user simply sets a value as an initial guess of the optimal critical state K∗

• And follows the strategy (10) with the estimate during the whole learning period

Page 38: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• This guess needs not to be accurate• After the learning period, the secondary

user updates the critical state K∗ accordingly.

• Fo The total number of transitions from S to S’ with the

action h taken• T• T• t

Page 39: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• The likelihood that such a sequence has occurred o A product over all feasible transition tuples o (S,a,S’) ∈ {P,J,1,2,3,...,KL + 1}×{s,h}×{P,J,1,2,3,...,KL

+1}

• Define • The following proposition gives the MLE

of the parameters β, γ, and ρ

Page 40: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• Proposition1: Given , S ∈ and , S ∈ counted from history of transitions, the MLE of primary users’ parameters are

Page 41: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• The MLE of attackers’ parameters ρML is the unique root within an interval (0, 1/(KL + 1)) of the following (KL + 1) order polynomial

• Proof

Page 42: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• With transition probabilities specified in (4) – (7)

• The likelihood of observed transitions (11) can be decoupled into a product of three terms Λ = ΛβΛγΛρ

Page 43: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• By differentiating ln Λβ, ln Λγ, lnΛρ and equating them to 0o Obtain the MLE (12) (13) and (14)

• To ensure that the likelihood is positive, ρ has to lie in the interval (0, 1/(K + 1)) o The left-hand side of equation (14) decreases

monotonically and approaches positive infinity as ρ goes to 0

o The right-hand side increases monotonically and approaches positive infinity as ρ goes to 1/(KL + 1)

Page 44: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Learning the Parameters

• After the learning period, the secondary user rounds M ·ρML to the nearest integer as an estimation of m

• Calculate the optimal strategy using the MDP approach described in the previous section

Page 45: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Agenda• Introduction• Related Works• System Model• Optimal Strategy with Perfect

Knowledgeo Markov Modelso Markov Decision Process

• Learning the Parameters• Simulation Results

Page 46: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Simulation Result• Communication gain R = 5 • Hopping cost C = 1• Total number of bands M = 60• Discount factor δ = 0.95• Primary users’ access pattern

o β = 0.01, γ = 0.1

Page 47: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Simulation Result

• When the threat from attackers are more stronger the secondary user should proactively hop more frequently o To avoid being jammed

Page 48: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin

Simulation Result

o Always hopping: the secondary user will hop every time slot

o Staying whenever possible: the secondary user will always stay in the band unless the primary user reclaims the band or the band is jammed by attackers.

Page 49: Presenter: Wayne Hsiao Advisor: Frank ,  Yeong -Sung Lin