B4_Detetcion_Kadambe

7/29/2019 B4_Detetcion_Kadambe

1/106

Detection & Classification of RF signals and,

Physical and Network Layer Behavior of

Software Defined/Cognitive Radios

Dr. Shubha Kadambe

Advanced Technology CenterRockwell Collins

400 Collins Rd., NE

Cedar Rapids, IA 52498slkadamb@rockwellcollins.com

310-263-8455
mailto:slkadamb@rockwellcollins.commailto:slkadamb@rockwellcollins.com


2/106

Why?

Detection & classification of RF signals is a front endprocessing for

Geo location for networking radios

Interoperability for making two different radios to talk to each other

Spectrum sensing and management

Detection and classification of physical and networklayer behavior is needed for

Security Spectral management and dominance

2


3/106

Why?

Detection and Classification is a classical problem that has many

applications

In particular for RF signals it has

Military: Signal intelligence (SIGINT), Electrromagneitc intelligence(ELINT) and Communication intelligence (COMINT), Electronic Warfare

Commercial: Software defined radio (SDR)/Cognitive Radio (CR)

In military applications it is a challenging task since

New RF threats are introduced every day

Friendly forces should have spectral dominance in the presence ofhostile signals

The spectrum of these signals

may range from high frequency (HF) to millimeter frequency band and

their format can vary from simple narrowband modulations to widebandschemes.

Techniques need to operate in real-time to make critical decisionsquickly in electronic warfare & tactical operations.

3


4/106

Why?

in software defined radio (SDR)/CR

Information is transmitted to reconfigure the SDR system.

These techniques can be used with intelligent transceiver toincrease the efficiency by reducing the overhead

Should be able to operate at very low SNR & in presence ofinterferer

4


5/106

Detection

5


6/106

Problem formulation

A general problem of detection corresponds:

Choose between two hypotheses

Common form of signals are:

Completely known s(t) =m(t)

Known except for a few parameters such as:

Purely stochastic: s(t) =z(t)

Common assumption of noise are: Zero mean white Gaussian

Zero mean colored with a white Gaussian component

Purely colored with zero mean (generally not used)

noiserandomis)(andinterestofsignaltheis)(where

0)()()(:

0)()(:

1

0

tnts

TttntstyH

TttntyH

+=

=

( )

{ } randomorunknownbemay,,ofncombinatiosomewhere

cos)()(

0

0

+= ttAts

6


7/106

Detection Deterministic known signal In general, a decision is made by

Deriving a statistic based on n(t)

Comparing it to a preset threshold

Ifn(t) is white Gaussian noise with mean 0 and variance

Then pdfs under two hypotheses can be shown to be:

Likelihood ratio is given by:

2

N

( )

( ) ( ) )(&)(ofversionssampledare&where21

exp)(

&2

1exp)(

1

2

21

1

2

20

tstysysyHtyp

yHtyp

kk

K

kkk

N

K

k

k

N

=

=

=

=

( )( )( )( )( )

( )( )

( )( )

+=

=

==K

kk

K

kkk

Nssyy

pdfsy

Htyp

Htypy

1

2

12

0

1

22

1

ln

:getwefor thengsubstituti&lngconsiderinBy

7


8/106

Detection deterministic known signal Using previous equation the detection hypothesis is:

For the continuous case it is:

( )

.forratiolikelihoodtoscorrespondwhere

21ln2

21

:ifChoose

00

1

2

20

12

1

H

ssy

H

K

k

k

N

K

k

kk

N

==+

( ) + dttsdttsty

H

NN

)(2

1ln)()(2

1

:ifChoose

220

*2

1

8


9/106

Detection

deterministic known signal

From the previous equation it can be seen that the processing thatneeded is:

Correlating the stored signal s(t) with the received signal y(t) or

Passing y(t) through a filter matched to s(t).Matched filter approach of early radar literature

Optimum detector from the decision theoretic point of view

Matched filtering provides optimum solution but not very realisticsince signal is not known completely in practice

It is used to compare the detectors provides theoretical bound

9


10/106

Detection unknown signal parameters Generalization of the detection problem arises when the signal

s(.) has unknown parameters.

In this case hypotheses are:

Leads to Generalized Likelihood Ratio test (GLRT) and GLRT isgiven by:

parametersunknownofvectoraiswhere

0)();()(:

0)()(:

1

0

TttntstyH

TttntyH

+==

+= yyy)(Pymax 22 TNTNGLRT

10


11/106

GLRT detector block diagram

+= yyy)(Pymax 22 TN

TNGLRT

received

Signaly(t)

Max. likelihood

signal parameterestimator

y)(Pymax

T

yyT

correlator

+ > T

yes

signal present

no

signalabsent

segment

the

signal

N

2

N2

11

Estimator-Correlator


12/106

GLRT special case Consider a signal with uniformly distributed random phase:

( ) ( ) ( )

[ ]

[ ]( ) ( )

( ) ( )

( ) ( ) dt)()(A(t)sint

dt)()(A(t)cost

kind,firsttheoffunctionBesselmodified

orderzeroththeis,;

where)()(exp

:shown thatbecanitassumptionnarrowbandUnder this

).cos(tocomparedyingslowly var)(),(with

202

1,)(cos)(;

0

0

0

22

2

1

2202

0

0

+=

+=

==

+=

=++=

tyttV

tyttV

IdttAdttsE

tVtVI

tttA

ptttAts

s

c

sc

E

12


13/106

GLRT

special case

The main operation on the received signal is:

Note that analysis of narrowband signals using a complex

envelope was first suggested by Gabor Woodward used this for signal detection in radar and developed an

ambiguity function to understand the resolution limits of radar

( ) ( )[ ]( ))()(cos)(signal

narrowbandthetomatchedfilteraofoutput

0

2

122

tttA

tVtV sc

+

+

13


14/106

Detection

time-frequency domain

Estimating signal parameters can be transferred to time-frequency domain

Feature selection easier

Can reduce noise effect

Several time-frequency distributions exist

Wigner

Windowed spectrum

Gabor

Choi-William

RID

We consider one that tries to reduce the cross-term

Cross-term Deleted Wigner Distribution (CDWR)

14


15/106

Definitions - 1 For a given signal x(t), the Gabor and the WD are defined as:

Complementary Gabor coefficients can be obtained

by reversing the role ofh and Using these coefficients, x(t) can be expanded as:

Gxm nx t t m e

j ntdt

h t

Wx t w x t x t ej d

,( ) *( )

( ),

( , ) * ,

=

= +

2

2 2

where, is an analysis window

that is biorthogonal to sysnthesis window

respectively.

Gxm nGxm n,

& $,

x t Gxmnnmhmn t Gxmn mn tnm( ) , , ( )

$

, , ( )= =

15


16/106

Definitions - 2 Substituting Gabor expansions ofx(t) in the WD definition and after

some algebraic simplifications it can be shown that:

The auto-WD terms m = p, n = q . Retaining only autoterms

crossterm deleted cross biorthogonal representation (XBIO)

When TFR - the Crossterm Deleted

Wigner Representation (CDWR)

[ ]{ }

Wx t w

p qm n

Gx

m nGx

p qW

ht

n qT

m p

T( , )

,,

,

$

, ,,=

+

+

2 2

.

ej (m+ p)(n-q) / 2+(m-p)t / T-(n-q)T

Gxm nGxm n,$

,=

16


17/106

Definitions of XBIO & CDWR

The XBIO x(t) is:

Similarly, the CDWR ofx(t) is:

XBx t Gxm nGxm nWh t nT

m

Tm n( , ) ,

$

,

*

, ,, =

.

CDWRx t Gxm nW

ht nT

m

Tm n

( , ) |,

| ,

,

=

2 .

CDWR is a special case of XBIO

17


18/106

Example: The CDWR of a linear chirp

18


19/106

Detection

Detectors:

Matched filter:

Auto-CDWR:

Cross-CDWR:

whereA is the energy of the signal.

2)(*)(8),(*),( == dttstrdtdfftssCDWRftrrCDWRacdwr

= dttstrmf

)(*)(

2)(*)(8),(*),( == dttstrAdtdfftssCDWRftrsCDWRxcdwr

19


20/106

Detection -

Performance of detectors

Performance measure is:

Using this measure it can be shown that:

T

The performance of the detector based on XCDWR is better than the

ACDWR and is equivalent to the detector based on matched filter

SNRH H

H H

=

+

1 0

1

2 1 0

1

2var var

SNRmf

A

N

SNRacdwr

A

N N

A

SNRxcdwr

A

N

=

=

+

=

2

2

1

2 1

2

2

20


21/106

Block diagram of the CDWR based detector

received

signal

is

prototype ref.

signal

estimated?

no

yes

> T1yes

estimate the

prototype

signal

$( )s t

$( )s t

segment the

signal

r t( )

compute

CDWR of

r(t) &

$( )s tcomputecross-corr > T2

signal

absent

signal present

yes

compute

CDWR

computeCDWR of

r(t)

21


22/106

Synthetic data details

Data consists of modulated Gaussian pulses

The signal parameters are:

amplitude, arrival time, spread of the pulses, modulationfrequency, phase and the sparseness i.e., the number of pulseswithin a frame of data

They were randomly varied for different experiments

The received signal was embedded in white Gaussiannoise with zero mean &

various noise variances that correspond to SNRs from 0 dB to -

12 dB Experiment was repeated 100 times.

22


23/106

Detector performance (synthetic signal)

23


24/106

Simulation setup

Synthetic Gaussian pulses with random arrival time, width, modulationfrequency and density (of pulses - overlapped) are generated.

White Gaussian noise of various SNRs (+3 to -6 dB) is added to the

generated synthetic signal. At each SNR, the detection experiment was iterated 10,000 times.

For each iteration,

In the case of the CDWR, cross-correlation coefficients are computed

whereas for the GLRT, signal parameters are estimated andis computed.

Signal is detected if the correlation coefficient or is above certainthreshold.

Threshold is set for a fixed probability of false alarm. The probability ofdetection is computed for different threshold values and ROC curves aregenerated. These curves are plotted in the next viewgraph

From this figure, it can be seen that the CDWR based detector

performs better than the GLRT at low SNRs ( < 3 dB ).

GLRT

GLRT

24


25/106

Detectors performance

ROC curves for XCDWR and GLRT for noisy signal with differentSNRs.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Probability of False Alarm

ProbabilityofDetection

solid = GLRT

dotted = xcdwt critical sampling

dashed = xcdwt 2x oversampling

3db

0db

-3db

-6db

-3db

-6db

25


26/106

Detector performance (real acoustic signal)

Acoustic signal and correlation coefficients in the case of ACDWR and XCDWR

26


27/106

Why CDWR detectors performance is

better?

In the case of the CDWR, the prototype signal is estimated afterprojecting the received signal onto the time-frequency plane.Advantages of this are:

time-frequency localization and reduced noise effect and hence better estimate.

The noise effect can be minimized by designing the analysis andsynthesis window functions (which are used in the computation of the

CDWR) by applying certain constraints such as minimum energy. However, in the case of GLRT, the accuracy of the signal

parameter estimations deteriorates with increase in noise level.Therefore, s^(t) of CDWR is more close to s(t) than the GLRT.

Hence, at high SNR these two detectors perform almost equally

at low SNR, the CDWR detector performs better.

27


28/106

Detection of RF signals

Manmade signals can be considered as cyclostationaryrandom process

Exhibits peaks in spectrum

Popular techniques are:

radiometry based peak detection

spectral correlation detection

Cyclostationary feature detector

Channelized receiver


29/106

Radiometric detector

Bandpass filter

( )2 Integrator Hypothesis test

Recei

vedsignal

Detection

decision

detects energy in the bandwidth of the bandpass filter using a coherent

processingThe resultant test statistic is compared to a threshold which can beestablished using various detection criteria - Bayes, Neyman- Pearson,etc.) and which varies as a function of channel characteristics.

The signal of interest is declared "present" whenever the test statisticexceeds the threshold.


30/106

Cyclostationary

feature detector

Assumption: zero mean discrete time signal x(n) exhibits widesense second order cyclostationarity

Is periodic in terms of fixed lag l=+-1, +-2,..

Example: Orthogonal Frequency Division Multiplexing (OFDM)signal

Fourier coefficient of Rxx(n,l) cyclic autocorrelation function is:

In practice it is estimated as:

[ ] )()(, * lnxnxlnRxx +=

( )

=

=

1

0

2

,1lim N

n

N

nkj

xxxx elnRNN

R k

frequencycarrierandschememodulationrate.symboltorelatedis

&kindexoffrequencycyclictheiswhere)()(1

k

1

0

2*

=

+=

N

n

N

nkj

xx elnxnxN

R k)


31/106

Cyclostationary

feature detector

ifk

is cyclic frequency but it can be non zero even when

k

is not cyclic frequency because of the estimation statistical

test is needed for detection

One such test is based on GLRT

0kxxR

)


32/106

Wideband detector

Need to

Detect wide and narrow band signals simultaneously

Detect multiple signals that are present simultaneously

Solutions

Channelized detector

Time-frequency representation / time-frequency atom based

detector

Ch li d d t t


33/106


Reliable detection and arbitration of UWB signals that coexist with

NB signals is very difficult if not impossible with radiometrictechniques.

Channelization techniques are one of the alternatives to radiometricdetection to address this problem

Previous work has been reported on a multi-radiometer system fordetecting impulse radio signals.

Uses a form of temporal channelization, i.e., the observed frame time

(Tf) of the received UWB signal is equally divided into M segments. Each segment of time data is then processed by a wideband radiometer

using TR = Tf/M over bandwidth WRAD ~ WUWB (known bandwidth case).

The M radiometer outputs are then logically combined such that if any of

the individual outputs is positive the UWB signal is declared present(detection occurs).

Ref: Communication Channel Assessment: Detection of Ultra Wideband Signals Using aChannelized Receiver, Brett D. Gronholz, Michael A. Temple, Robert F. Mills & Willie H. Mims, 2005International Conference on Wireless Networks, Communications and Mobile Computing


34/106


Channel outputs are collectively processed to arrive at the desiredconclusion.

The intent is to exploit the power of channelization and develop arbitrationtechniques to establish how many and what type signals are present.

BPF1 A/D

BPF2

BPFM

A/D

A/D

Digitalprocessor

Block Diagram of Channelized Receiver Using Total Bandwidth WTot Spannedby M Filters


35/106

Issues with Channelized detector

The fundamental receiver challenge is to determine the

number of signals present and

spectral characteristics of each, i.e., center frequency, bandwidth,

and/or other parameters of interest Specifically, the first part of the fundamental challenge involves

non-cooperative communication channel assessment,

i.e., given total channel bandwidth WTotdetermine

1) if there is a signal present, and 2) what features does the signal(s)have, i.e., NB, UWB, etc.


36/106

Time-frequency atom based wideband multiple

signal detector

Average

signal

signalCompute

Time-frequencyrepresentation

Obtain

time-frequencyatoms

Compute

Spectral energyIn each atom

& mean spectralenergy across

time dim

ComputeHistogram &Determinethreshold

Detect allEnergy peaks

Above thethreshold

EstimateBeginning

End ofSignals

Time-frequencyspectrum

atoms

Energydistribution

threshold#of localpeaks &energyvalue

#detectedsignals andtheirestimatedbeginning andend

Average

signal

signalCompute

Time-frequencyrepresentation

Obtain

time-frequencyatoms

Compute

Spectral energyIn each atom

& mean spectralenergy across

time dim

ComputeHistogram &Determinethreshold

Detect allEnergy peaks

Above thethreshold

EstimateBeginning

End ofSignals

Time-frequencyspectrum

atoms

Energydistribution

threshold#of localpeaks &energyvalue

#detectedsignals andtheirestimatedbeginning andend

Rockwell Collins Proprietary


37/106

How does it work?

The received signal is buffered for ten frames.

It is averaged over ten frames to reduce noise effect.

A 2 dimensional time-frequency representation spectrogram is computed using the averaged signal.

Since it is a 2 D representation,

if there are multiple signals with different bandwidths and centerfrequencies and occurring at different times,

it would exhibit spectral energy at around that frequencybandwidth and time.



38/106

Spectrogram example



39/106

How does it work (cont.)?

The time-frequency space of the spectrogram is dividedin to smaller regions called time-frequency atoms.

The spectral energy in each of these atoms iscomputed.

Then it is averaged over time.

This results in a spectral energy distribution acrossfrequencies.

An example 2D spectral energy in each time-frequencyatom and mean spectral energy distribution are shownin two following slides, respectively.



40/106

Example 2D spectral energy

The energy bands corresponding to each signal is much clearer in

the 2D energy plot of the time-

frequency atoms as compared to the spectrogram why time-frequency atoms are considered.



41/106

Example mean spectral energy distribution


How does it work (cont )?


42/106


Using the spectral energy distribution, a histogram is

computed. A threshold value corresponding to maximum number of

values fall within a bin (the first bin in the figure below)

is chosen. Such a selection is made because most of the lower

spectral energy values correspond to the background.

This helps in characterizing the background statisticallyand fixing the threshold value adaptively based on thechanges in the background noise level.

An example histogram is shown in the next slide.


E l hi t f t l


43/106

Example histogram of spectral energy

distribution


( )?


44/106


From the figure in previous slide, it can be seen that bin 1 has the most number of entries.

From 2D spectral energy plot,

it can be seen that very few time-frequency atoms have high

energy indicated by darker bands and

most of the time-frequency atoms have low energy values.

Hence, by choosing the threshold value that corresponds to bin1 we would be eliminating the background noise.


H d it k ( t )?


45/106


From the spectral energy distribution, local peaks and the associated center frequencies are located

first.

If a local peak is above the chosen threshold then a decision is

made that a signal present at that peak location.

The # of chosen local peaks indicates the number of signalspresent.

The associated locations of the peaks in the frequencydetermine the center frequency of those signals.


How does it work (cont )?


46/106

How does it work (cont.)? For example, in the following figure, it can be noticed that seven local peaks

are above the background level.

The peak locator also makes a decision of whether the neighboring peaksare too close.

If they are then it ignores those peaks.

Hence, in the below example, it ignores the two neighboring peaks that are

present on either side of the peaks located at frequency indices 16 and 22resulting in 4 peaks instead of 7.

These four peaks correspond to the four signals present.

The associated peak locations correspond to the center frequencies of

those signals.


H d it k ( t )?


47/106


Using the estimated center frequency and theassociated spectral energy,

the time-frequency space of the 2D energy distribution of time-frequency atoms is searched in the time dimension to estimate

the beginning and end of that signal in time.

Finally,

the estimated # of signals, their center frequencies and the

beginning and end time are outputted to the user.



48/106

Classification

48


49/106

Classification block diagram

Detected signals are processed FFT, spectrogram, T-F distribution Features are extracted Features are used in classification

Classification Model based Clustering based

Classifiers Open set Closed set

Pre-

processingFeature

Extraction

Feature

Classification

InputSignals

Signal

Classes


50/106

Classifiers

Model based

Hidden Markov Model

Neural network

Probabilistic neural network (PNN)

Gaussian mixture model

Clustering

K-means Discriminant analysis

Support Vector Machine

Gaussian Mixture Model (GMM) based


51/106

( )

signal Classifier Architecture

CumulantBased

FeatureExtraction

TestSignal (I &

samples)

BayesianClassifier

Testing

Class id

Training Data(I & Q

Samples ofSignals)

GMMParameter

(mean vector

& covariancematrix)

Estimation

Training

Cumulant

BasedFeatureExtraction

Rockwell proprietary

Cl ifi Bl k di f C t ti


52/106

Classifier Block diagram of Computationof Feature vector

LinearTransformation L

WindowedNoisySignal

(I & Q)

LinearTransformation 1

ComputeCumulants

ComputeCumulants

AdditionalProcessing

Features

FrequencyOffset andBandwidthCorrection

Linear

Transformation L

WindowedNoise(I & Q)

LinearTransformation 1 ComputeCumulants

Compute

Cumulants


G (G )


53/106


Classifier Architecture

CumulantBased

FeatureExtraction

TestSignal (I &

samples)

BayesianClassifier

Testing

Class id

Training Data(I & Q

Samples ofSignals)

GMMParameter

(mean vector

& covariancematrix)

Estimation

Training

Cumulant



Gaussian Mixture Model (GMM) Parameter


54/106

Estimation

Expectation Maximization Algorithm Algorithm which helps in estimating the parameters

of a multivariate distribution given the data

Input: Data (x), Number of mixtures/model (C) Output: C Mean Vectors (), C Covariance Matrices ()

Multivariate Normal Distribution:

where F

is the number of features (5-8 in our example)

Hence the likelihood (probability of data givena classification) is:

)()(21 1

)det()2(

1}),{,(

iiT

i xx

i

Fiiexp

=

= =C

i

iixphxp1

}),{,()|(


G i Mi t M d l (GMM) b d


55/106


Classifier Architecture

CumulantBased

FeatureExtraction

TestSignal (I &

samples)

BayesianClassifier

Testing

Class id

Training Data(I & Q

Samples ofSignals)

GMMParameter

(mean vector

& covariancematrix)

Estimation

Training

Cumulant



Bayesian Classifier


56/106

Bayesian Classifier

N

is the number of classes

A Bayesian Classifier is a statistical classifierwhich utilizes Bayes rule to make theclassification decision:

where h

is a class, x

is the data.

Each class is equally likely, hence: p(x)

can be approximated as:

Pick a class with largest p(h|x)

Nhp

1)( =

)(

)()|()|(

xp

hphxpxhp =

==N

i

ii hphxpxp1

)()|()(



57/106

Classifier

Unknown class

Unknown class detection

Uses distance measure based on Bhattacharya

If a new signal is detected that classifier is nottrained for, it is classified as unknown

Unknown class processing

Features of unknown classes are passed toCognitive Engine

It learns about new classes

Provides required information to retrain theclassifier for these new classes



58/106

-20

2 4

68

-5

0

5-10

-5

0

5

10

-20

2 4

68

-5

0

5-10

-5

0

5

10

-20

2 4

68

-5

0

5-10

-5

0

5

10

-20

2 4

68

-5

0

5-10

-5

0

5

10

-20

2 4

68

-5

0

5-10

-5

0

5

10

-20

2 4

68

-5

0

5-10

-5

0

5

10

-20

2 4

68

-5

0

5-10

-5

0

5

10

-20

2 4

68

-5

0

5-10

-5

0

5

10

Example: Learning unknown signals


Classifier Performance For USRP Radio


59/106

Generated Data

USRP radio (uses GNU radio software) datacollection

Data collected outdoors on RC campus

5 Gnu Radio waveforms considered - DBPSK, DQPSK, D8PSK,GMSK, Gaussian noise

Bit rate was 500 kbps for the data waveforms

Background noise collected for noise correction

Results shown on next slide High SNR case - uses only the background noise (SNR varied,

but was at least 10 dB)

Low SNR case - added AWGN to the collected data so that theSNR is around -3 dB

DQPSK and D8PSK are very close and should be hard todistinguish

We have success in classifying them


Classification accuracy for USRP radio


60/106

generated dataNoise DBPSK DQPSK D8PSK GMSK Gaussian

Noise 100 0 0 0 0 0

DBPSK 0 100 0 0 0 0

DQPSK 0 0 100 0 0 0

D8PSK 0 0 0 100 0 0

GMSK 0 0 0 0 100 0

Gaussian 0 0 0 0 0 100

Noise DBPSK DQPSK D8PSK GMSK Gaussian

Noise 100 0 0 0 0 0

DBPSK 0 95 0 5 0 0

DQPSK 0 0 87 13 0 0D8PSK 0 0 24 76 0 0

GMSK 0 0 2 1 97 0

Gaussian 0 0 0 0 0 100

10dB SNR

-3dB SNR



61/106

Real-World Data

Signals Collected 3 HDTV Stations (2 VHF, 1 UHF)

2 FM Radio Stations (96.5 MHz, 106.1 MHz)

Push-to-talk signal (FM modulation)

Weather Radio Station (AM modulation)

CB Radio (AM modulation)

Results shown on next slide

Each class had its own noise For omnipresent signals, noise sampled in adjacent band

Only non-noise classes shown in results

High SNR case - uses only the background noise (SNR varied,

but was at least 10 dB) Low SNR case - added AWGN to the data so that the input SNR

is around -3 dB


Classification accuracy for Real-World


62/106

dataTV1 TV2 TV3 FM1 FM2 WT WX CB

TV1 100 0 0 0 0 0 0 0

TV2 0 100 0 0 0 0 0 0

TV3 0 0 100 0 0 0 0 0

FM1 0 0 0 100 0 0 0 0

FM2 0 0 0 0 100 0 0 0

WT 0 0 0 0 0 93 7 0

WX 0 0 0 5 0 0 91 0

CB 0 0 0 0 0 0 0 100

10dB SNR

-3dB SNR

TV1 TV2 TV3

FM1

FM2

WT WX CB

TV1 100 0 0 0 0 0 0 0

TV2 3 97 0 0 0 0 0 0

TV3 0 0 100 0 0 0 0 0FM

1

0 0 0 91 8 0 1 0

FM2

0 0 2 0 98 0 0 0

WT 0 3 0 0 0 87 9 1

WX 0 0 0 24 2 7 58 8CB 0 0 0 0 0 1 3 96


Classifier performance on field data


63/106

p

Classifier performance on field data before learning

True

Test

MSK GMSK BPSK QPSK 16QAM Unknown

MSK 60 0 0 0 0 0

GMSK 0 60 0 0 0 0BPSK 0 0 60 0 0 0

QPSK 0 0 0 60 0 0

16QAM 0 0 0 0 60 0

OOK 0 0 0 0 0 60

FM 0 2 0 0 0 58

2FSK 0 0 0 0 0 60

4FSK 0 0 0 0 0 60AM 0 3 0 0 0 57

DSBSC 0 0 0 0 0 60

Training on synthetic signals of known 5 classes; testing on 11 real-

world signals


Classifier performance on field data Classifier performance after learning


64/106

Classifier performance after learningTrue

Test

MSK GMSK

BPSK

QPSK 16QAM Cluster1 (AM)

Cluster 2(2FSK)

Cluster3

(OOK)

Cluster 4

(4FSK)

unknow

n

MSK 60 0 0 0 0 0 0 0 0 0

GMSK 0 60 0 0 0 0 0 0 0 0

BPSK 0 0 60 0 0 0 0 0 0 0

QPSK 0 0 0 60 0 0 0 0 0 0

16QAM 0 0 0 0 60 0 0 0 0 0

OOK 0 0 0 0 0 0 0 60 0 0

FM 0 2 0 0 0 0 0 13 15 30

2FSK 0 0 0 0 0 0 59 0 0 1

4FSK 0 0 0 0 0 0 0 0 58 2

AM 0 3 0 0 0 50 0 0 0 10

DSBSC 0 0 0 0 0 0 0 10 0 50

Learning resulted in four clusters of size >= 50

We have shown:1) Our classifier works on field data2) We can train on synthesized data and then test on real-world data.3) We can learn unknown signals and define new classes.


Modulation of RF signals - Radar


65/106

Modulation of RF signals Radar To uniquely represent different types of modulation of radar

impulses and to classify them,

we have developed a multi-class classifier that is constructed bycombining a set of binary support vector machines (SVMs).

we have derived a set of innovative features using both high orderstatistics and information measures such as Renyi entropy and

relative entropy.

F t


66/106

Features

Features used to represent signal information

content include: Renyi entropy

Energy ratio

Frequency change Higher order statistics skewness

Relative entropy

Why these features?


67/106

Why these features? In general, signals are distorted by the transmission channel and the

receiver system Complete signal information is not available

Need robust, unique and optimum features that help in accuraterepresentation

The selected features represent distorted signals uniquely and robustly

Entropy is a measure of information content uniquely represent informationcontent of a signal

Renyi entropy is a generalized version of Shanon entropy more robust Relative entropy is a measure of relative information provides how

information is changing relatively

Statistical features such as skewness are robust

Features such as energy ratio and the frequency change uniquelyrepresent signals

F t 1 R i t


68/106

Feature 1: Renyi entropy

Notations:

s(t) : signal; S()

: FFT of s(t)

e(t) : envelop of s(t); E()

: FFT of e(t)

Feature 1: Renyi

entropy

where

and

))((1 FEHF =

))()(()( teteFFTFE =

yprobabilitand10)(log1

1)( 2 pipxH

i

x


69/106

Feature 2 : energy ratio

Feature 2: Energy ratio (of the envelope e and thes)

where

s

e

F

=2

[ ]

[ ] .operatornexpectatioanis

*))((

ExEmand

mxmxE

x

xxx

=

=

Feature 3: frequency change


70/106

Feature 3: frequency change

Feature 3 : Frequency change

Let and are segments of , then

where

and

=

=n

ii tsts

1

)()( )(tsi )(ts

)min()max(3 fsfsF =

},2,1:{ niffs i K==

)).(( ii Scenterf =

Feature 4 : skewness


71/106

Feature 4 : skewness

Feature 4: Higher order statistics - skewness

where

[ ]334

))((1

FE

FE

mFEEF =

)).()(()( teteFFTFE =

Feature 5: relative entropy


72/106

Feature 5: relative entropy

Feature 5 : Relative entropy

Let and be the upper and lower envelopes of,

then

where

and

)(1 te )(2 te

)(ts))(),(( 215 FEFEDF =

)()( iii eeFFTFE =

)()(

)()( log)(log)(),( jxp

jyp

jyiyp

ixp

ix jpipyxD +=

Signals considered and its generation


73/106

Signals considered and its generation

Considered signals are: Analogue

AM with and without ripple

FM chirp with and without ripple

DigitalQPSK Signal generation and description:

Synthesized using a realistic channel model and areceiver system

Analogue signals were generated using the abovesystem

Digital QPSK signals were also generated using theabove system

Signals are quite distorted - full (or sometimes even half)

signal spectrum is not available

Simulation details computation of


74/106

features

For each signal type 400 pulses are used 200 for training and 200 for testing

The ground truth of each signal pulse is knownThe ripple frequency, pulse width, pulse raising and

falling edges and additive noise are randomly

varied In the case of QPSK, phase, pulse width and noise

are randomly varied

Simulated noisy representative four pulses areplotted in the next slide


75/106

Simulated noisy pulses - example

Simulation details computation of features 2


76/106

Simulation details computation of features 2

The plot the additive Gaussian noise corrupted the envelopes of thepulses

such that the rippled and non-rippled pulses are not easy to distinguish.

Most of our features are extracted from the envelopes of pulses,

we used a simple but computationally efficient peak detecting technique. For the spectral based features, we used the windowed FFT.

For the computation of both Renyi entropy and the relative entropy, theprobability values are obtained from the histogram.

Next slide presents a plot of three features relative entropy, frequencychange and skewness for pulses of 10dB SNR.

From this plot it can be seen that these features form clusters.

our chosen novel features can represent different classes of modulatedpulses (that are closely related) fairly accurately.

Cluster plot of signals features


77/106

p g

Feature clusters of signals


78/106

Feature clusters of signals

Classifier SVM


79/106

Classifier - SVM A SVM is a supervised statistical learning machine.

In its learning process, an SVM constructs an optimal hyper-planeas its decision surface

using a small set of training data called support vectors that are the datapoints closest to the decision surface and the most difficult to classify.

The optimization for computing the decision surface is achieved bythe principle of structural risk minimization.

Since in many applications, optimal decision surfaces could be non-linear,

an SVM uses a set of non-linear transfer functions - inner-product kernelto map the data from the input space into a high- dimensional feature

space such that a non-linear decision surface in the input space becomes a

linear decision surface (optimal hyper-plane) in the feature space.

Motivation


80/106

Motivation

H3 doesn't separate the 2 classes. H1 does, with a small margin and H2 withthe maximum margin.

SVM - Construction


81/106

SVM - Construction The procedure for constructing an SVM can be

described as follows:

Let data set be the training set & be

the inner product kernel function

The objective function of constructing an optimaldecision surface is:

subject to the constraints

{ }Ni

idix

1

),(

=

),( jxixK

),(1 12

1

1)( jxixKjdidj

N

i

N

ji

N

iiJ =

=

=

=

NiforCi

andidN

ii

.....,3,2,10)2(

01

)1(

=

==

SVM - Construction


82/106

SVM - Construction C is a user-specified positive parameter called the cost

of mistakes. The optimal parameter vector isdetermined by maximizing , i.e.,

Then, the optimal decision surface can be written as:

Where is a support vector & is the # of support

vectors and b is the bias term The decision surface is an optimal hyperplane in the

feature space - hidden space related by the kernelfunction

*

)(J

)).(maxarg(*

J=

bxivKsN

iidixf +=

= ),(1

*)(

iv sN

)(xf

),( yxK

SVM - Construction


83/106

SVM - Construction The most common kernel functions that are used in

practice are: the polynomial & Gaussian functions:

p

y

T

xyxK )1(),( +=)

2

22

1exp(),( yxyxK =

Finding the optimum hyperplane


84/106

Finding the optimum hyperplane

Maximum-margin hyperplane and margins for a SVM trained with samples fromtwo classes. Samples on the margin are called the support vectors.

SVM - Multiclass


85/106

As described, the SVMs use an optimal hyper-plane asa decision surface for the classification of input data. Since a hyper-plane can only separate two classes, the

SVMs were originally developed for binaryclassification problems The optimal design of SVMs

for multi-class

classificatiion

is still a research topic.

We have extended the binary approach for multi-classclassification problem

by classifying each class from the rest of all other classes iteratively

Multi class Classifier


86/106

Multi-class Classifier

Our approach can be described as follows:

Let be N classes of signals.

We construct N classifiers and each classifier istrained by the method of one-class-versus-the-rest;

that is, the classifier fi is trained for Ci versus the rest of the classes.

Then in the signal classification phase, the classifiers perform

according to the following decision rule:

where the function fk(x) provides the distance ofx to thedecision surfaces.

{ }Niic ,...3,2,1: =

{ }1,...3,2,1: = Niif

{ },1,...3,2,1;0)(:)(max)( =>=

Nkxk

fxk

fxi

f

ificx

Multi-class Classifier


87/106

While classifying the features using our multi-class SVMbased classifier we used both Gaussian and polynomial

kernel functions We obtained a little bit better classification performance in

the case of a polynomial kernel function;

however, the computational speed of classification wasfaster in the case of a Gaussian kernel function.

Hence, we used Gaussian kernel function in our

experiments.

Simulation details - Classification


88/106

All four signal types were considered

Four features Renyi entropy, relative entropy, energyratio and frequency change were used

Classification results for pulses with 10dB SNR arereported in the form of a confusion matrix in the next slide.

From this, it can be seen that these features represent

information content of a signal pretty accurately

Classification results


89/106

Classification results

200 pulses for training and 200 pulses for testing

0.910.080.020.0FM ripple

0.01.00.00.0FM non-

ripple

0.00.00.950.05AM Ripple

0.00.00.040.96No-ripple

FM rippleFM non-

rippleAM RippleNo-ripple

TrueClasses

Computed Classes

Table 1: Classification results for SNR = 10 dB

Cl ifi ti P f SNR


90/106

Classification Performance vs. SNR

Mutual information measure for the selection of non-d d f SO


91/106

redundant features/SOI

1. Renyi

Entropy

2. Freq. Chang

3. Energy Ratio

4. Skewness

5. Relative

Entropy

6. Kurtosis

7. Pulse

Bandwidth8. Ripple

Frequency


92/106

Conclusions


93/106

Conclusions

Various techniques for detection and classification arediscussed

Particular emphasis is given to techniques applicable for low

SNRs Robust detector and classifier that works at low SNRs

are still a research topic

Opportunities exist for both military and commercialapplications


94/106

Phy layer and network layer behaviorlearning

94

Why Phy layer behavior learning?


95/106

y y y g CR/SDRs are being used in both military andcommercial applications

Adversaries can use them to attack radios without much effort

Most CR/SDRs research has focused on Quality ofService (QoS).

How do these algorithms respond to the actions of malicioususers?

Types of Attacks


96/106

Types of Attacks

Primary User Emulation (PUE)

Denial of Service (DoS)

Spectral Honeypot Attacks (SHA)

Primary User Emulation


97/106

Primary User Emulation

Actively attempting to confuse a CR/SDR into thinkingthat your signal is a primary user signal.

Specifically designed to attack the signal classification

component of a CR/SDR. If attacker is using a published standard (IEEE 802.11),

impossible to discern malicious users without additional

behavioral information. Makes feature selection within classifiers a critical

algorithm decision.

Denial of Service


98/106

Denial of Service

Goal is to disrupt some service provided by thecommunication node.

CR/SDRs know how to move channels when they are

being jammed What if you forced a DSA radio to continually move? It would

never be able to initiate real packet transfer thus achieving a

DoS.

Spectral Honeypot Attack


99/106

Spectral Honeypot

Attack

Given a certain band in the spectrum, lure or force theCR/SDRs to that band for a malicious purpose:

Man in the Middle Attack

Force degradation of secondary signal

Can use a PUE attack to force the radio into the band ofyour choice.

VTs Experimental Setup


100/106

VT s Experimental Setup

Three DSA 2100 Radios built by Shared SpectrumCompany.

One in base station mode.

Two in subscriber mode.

Vector Signal Generator

Tektronix RSA3408 Real-Time Spectrum Analyzer

USRP built by Ettus Research


101/106

VTs Experimental Results


102/106

VT s Experimental Results

Demonstrated DoS attack causing ~82% performancedegradation in DSA radios.

Can significantly degrade performance even with cheap COTS

components like a USRP. Demonstrated Honeypot Attack using PUE Using two

different methods, forced radio to target band in 5.6

seconds and 3.7 seconds.

Why network layer behavior learning? Interference can occur in a normally operated mobile wireless


103/106

Interference can occur in a normally operated mobile wirelessnetwork due to hidden terminals This condition can be intentionally created by the stealthy adversary

Programmable radios make it easy for attackers to emulate normalinterference.

need to distinguish malicious interference from normal and to

understand types of interference Type of malicious attacks

J amming attack by

Selective packet blocking (e.g,, ACK/control packets) by killing ACK/CTS

Blocking preamble (synching) such that a radio cannot lock on to a signal

Byzantine A node is compromised by the adversary to intentionally act inconsistently to

throw off routing protocols

Spoofing A device pretend to be a access point and thus obtain information about the

identity of wireless devices

Environment alteration

Purposely increase the background noise level to alter the power control

strategies of the devices

Why Network layer behavior learning?Why Network layer behavior learning?


104/106

y y g

Type of normal/benign attacks Background congestion, distance and mobility

Can occur due to

Noisy radio environment because of high ambient noise level

too many nodes that are close to each other and are constantlytransmitting

Hidden node

Two nodes are not within the sensing range but still interfere with thethird node

Can appear is different topologies

Cross protocol/technology

Devices using different protocols with overlapping frequency ranges

An Extension of PERAn Extension of PER--RSS Consistency CheckRSS Consistency Check


105/106

Entire signal space consists of three regions Interference-free: no hidden terminal

Normal interference: caused by legitimate hidden terminals

Intentional interference:malicious jamming

Thresholds are empiricallychosen using support

vector machine technique.

PER: Packet Error Rate, RSS: ReceivedSignal Strength

WINLAB

Challenges RemainChallenges Remain


106/106

A smart reactive jammer can take advantage of the captureeffect to throttle the victims throughput while keeping a lowPER.

160 170 180 190 200 210 2202

4

6

8

10

12

14

Transmission link distance (meters)

Normalizedthroughput(%)

Random

Reactive

B4_Detetcion_Kadambe

Documents

Transcript of B4_Detetcion_Kadambe

Chapter 23

18 Tricks to Teach Your Body

Compressing And Decompressing Folders

The Best American Humorous Short Stories

Introduction to Six Sigma

Who Killed God

Disclaimer

Jan Van Eyck and the Man In A Red Turban

Heidegger Kritik

Acetone Peroxide

Daniel Zanella and Alexander Weygers

Chapter 24

Do you admire Leonardo da Vinci?

Barclays1

How Computer Monitors Work

Life Is Just A Dream - Or Is It?

I Am a Holocaust Denier and I Am Unafraid

European Colinization of Latin America

Star Wars Trivia!

Chapter-01