Learning to Detect Open Carry and Concealed Object with ...

12
1 Learning to Detect Open Carry and Concealed Object with 77GHz Radar Xiangyu Gao, Student Member, IEEE, Hui Liu, Fellow, IEEE, Sumit Roy, Fellow, IEEE, Guanbin Xing, Member, IEEE, Ali Alansari, Youchen Luo Abstract—Detecting harmful carried objects plays a key role in intelligent surveillance systems and has widespread applications, for example, in airport security. In this paper, we focus on the relatively unexplored area of using low-cost 77 GHz mmWave radar for the carried objects detection problem. The proposed system is capable of real-time detecting three classes of objects - laptop, phone, and knife - under open carry and concealed cases where objects are hidden with clothes or bags. This capability is achieved by initial signal processing for localization and generating range-azimuth-elevation image cubes, followed by a deep learning-based prediction network and a multi-shot post- processing module for detecting objects. Extensive experiments for validating the system performance on detecting open carry and concealed objects have been presented with a self-built radar-camera testbed and dataset. Additionally, the influence of different input, factors, and parameters on system performance is analyzed, providing an intuitive understanding of the system. This system would be the very first baseline for other future works aiming to detect carried objects using 77 GHz radar. Index Terms—carried object, object detection, deep learning, concealed, mmWave, FMCW, radar, public security. I. I NTRODUCTION A BILITY to detect person-borne threat objects remains an ongoing and pressing requirement in many scenarios such as airports, schools, and military checkpoints. Indeed the last decade has seen millimeter-wave (mmWave) security scanning portals emerge as the dominant technology for passenger body screening and have become a near-ubiquitous presence at major international airport hubs around the world [1]. However, the slow and complicated imaging process blocks higher passenger throughput rates [2]. For this purpose, a distributed screening system with weapons and other carried object classification capabilities which do not interfere with normal passenger flow seems to be an ideal system. Various sensors have been used for carried object detection. Recently, surveillance cameras with the ability to automatically detect weapons and raise alarms to alert the operators are developed using state-of-the-art deep learning models [3], [4]. However, cameras cannot deal with object blocking or occlusion problems and also pose privacy concerns [5]. To address it, numerous technologies which utilize different parts of the electromagnetic spectrum are considered for detecting open carry and concealed objects on persons [6], e.g., using X. Gao, H. Liu, S. Roy, G. Xing, A. Alansari are with the Department of Electrical and Computer Engineering, University of Washington, Seattle, WA, 98195, USA. (email: [email protected], [email protected], [email protected], [email protected], [email protected]) Y. Luo is with the Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98195, USA. (email: [email protected]) Fig. 1. Example usage scenario of carried object detection system: detecting a concealed knife and a open carry laptop on two pedestrians. ultrasound [7], mmWave [8], Terahertz [9], infrared [10], fusion of visual RGB image and infrared [11], X-ray [12], etc. The majority of existing weapon object detection algorithms are based on the screening (imaging) and contrast (bright spot) detection under the assumption of different reflection coefficients between carried objects and the human body [13]. The high-resolution imaging is usually achieved by scanning with large antenna aperture (1m) [9], [14] and synthetic aperture radar technique [15], [16]. For contrast detection, prior works proposed to identify weapon contour or edge via local binary fitting [11], two-level expectation-maximization method [14], and Gaussian mixture model [17], etc. Further, deep learning-based methods, e.g., Faster R-CNN [18], YOLO2 [19], were adopted for model-free and accuracy-improved weapon detection on radar imaging. For mmWave systems, there are two different approaches of imaging developed: passive sensing [14], [20], [21] and active sensing [8], [18], [22]–[24]. Passive mmWave imaging is done with natural illumination or with an incoherent noise source illuminating, which is good for detecting concealed weapons worn under clothes but takes a long scanning time (a few minutes) [20]. Active mmWave technique sends a signal via transmitter and then receives the reflected signal from forehand objects, e.g., frequency-modulated continuous-wave (FMCW) radar [22], [23], which offers higher scanning speed but decreased image quality due to the reflection scattering issues [20]. The feasibility of identifying a potential shooter carrying a concealed weapon (rifle) has been investigated in [8], [24] with 77 GHz mmWave FMCW radar. This anomaly detection is based on the micro-Doppler and range-Doppler signature arXiv:2111.00551v1 [eess.SP] 31 Oct 2021

Transcript of Learning to Detect Open Carry and Concealed Object with ...

Page 1: Learning to Detect Open Carry and Concealed Object with ...

1

Learning to Detect Open Carry and ConcealedObject with 77GHz Radar

Xiangyu Gao, Student Member, IEEE, Hui Liu, Fellow, IEEE, Sumit Roy, Fellow, IEEE,Guanbin Xing, Member, IEEE, Ali Alansari, Youchen Luo

Abstract—Detecting harmful carried objects plays a key role inintelligent surveillance systems and has widespread applications,for example, in airport security. In this paper, we focus on therelatively unexplored area of using low-cost 77GHz mmWaveradar for the carried objects detection problem. The proposedsystem is capable of real-time detecting three classes of objects -laptop, phone, and knife - under open carry and concealed caseswhere objects are hidden with clothes or bags. This capabilityis achieved by initial signal processing for localization andgenerating range-azimuth-elevation image cubes, followed by adeep learning-based prediction network and a multi-shot post-processing module for detecting objects. Extensive experimentsfor validating the system performance on detecting open carryand concealed objects have been presented with a self-builtradar-camera testbed and dataset. Additionally, the influence ofdifferent input, factors, and parameters on system performanceis analyzed, providing an intuitive understanding of the system.This system would be the very first baseline for other futureworks aiming to detect carried objects using 77GHz radar.

Index Terms—carried object, object detection, deep learning,concealed, mmWave, FMCW, radar, public security.

I. INTRODUCTION

ABILITY to detect person-borne threat objects remains anongoing and pressing requirement in many scenarios such

as airports, schools, and military checkpoints. Indeed the lastdecade has seen millimeter-wave (mmWave) security scanningportals emerge as the dominant technology for passenger bodyscreening and have become a near-ubiquitous presence at majorinternational airport hubs around the world [1]. However, theslow and complicated imaging process blocks higher passengerthroughput rates [2]. For this purpose, a distributed screeningsystem with weapons and other carried object classificationcapabilities which do not interfere with normal passenger flowseems to be an ideal system.

Various sensors have been used for carried object detection.Recently, surveillance cameras with the ability to automaticallydetect weapons and raise alarms to alert the operators aredeveloped using state-of-the-art deep learning models [3],[4]. However, cameras cannot deal with object blocking orocclusion problems and also pose privacy concerns [5]. Toaddress it, numerous technologies which utilize different partsof the electromagnetic spectrum are considered for detectingopen carry and concealed objects on persons [6], e.g., using

X. Gao, H. Liu, S. Roy, G. Xing, A. Alansari are with the Departmentof Electrical and Computer Engineering, University of Washington, Seattle,WA, 98195, USA. (email: [email protected], [email protected], [email protected],[email protected], [email protected]) Y. Luo is with the Paul G. Allen School ofComputer Science and Engineering, University of Washington, Seattle, WA,98195, USA. (email: [email protected])

Fig. 1. Example usage scenario of carried object detection system: detectinga concealed knife and a open carry laptop on two pedestrians.

ultrasound [7], mmWave [8], Terahertz [9], infrared [10], fusionof visual RGB image and infrared [11], X-ray [12], etc.

The majority of existing weapon object detection algorithmsare based on the screening (imaging) and contrast (brightspot) detection under the assumption of different reflectioncoefficients between carried objects and the human body [13].The high-resolution imaging is usually achieved by scanningwith large antenna aperture (∼1 m) [9], [14] and syntheticaperture radar technique [15], [16]. For contrast detection, priorworks proposed to identify weapon contour or edge via localbinary fitting [11], two-level expectation-maximization method[14], and Gaussian mixture model [17], etc. Further, deeplearning-based methods, e.g., Faster R-CNN [18], YOLO2 [19],were adopted for model-free and accuracy-improved weapondetection on radar imaging.

For mmWave systems, there are two different approachesof imaging developed: passive sensing [14], [20], [21] andactive sensing [8], [18], [22]–[24]. Passive mmWave imagingis done with natural illumination or with an incoherent noisesource illuminating, which is good for detecting concealedweapons worn under clothes but takes a long scanning time (afew minutes) [20]. Active mmWave technique sends a signalvia transmitter and then receives the reflected signal fromforehand objects, e.g., frequency-modulated continuous-wave(FMCW) radar [22], [23], which offers higher scanning speedbut decreased image quality due to the reflection scatteringissues [20].

The feasibility of identifying a potential shooter carryinga concealed weapon (rifle) has been investigated in [8], [24]with 77 GHz mmWave FMCW radar. This anomaly detectionis based on the micro-Doppler and range-Doppler signature

arX

iv:2

111.

0055

1v1

[ee

ss.S

P] 3

1 O

ct 2

021

Page 2: Learning to Detect Open Carry and Concealed Object with ...

2

analysis via deep learning. People with suspicious behaviorswill be detected, which however might not work when someoneexcels at not showing any physical signs or posture of carryingan object or just stands still. Besides that, 77 GHz mmWaveradars are starting to emerge and become more frequentlyused in autonomous driving for environment imaging [2], [23],semantic object detection [5], and occupancy grid mapping[25]. This is mostly due to their low price, impressive rangeresolution (can achieve 4 cm with 4 GHz available bandwidth),fine Doppler velocity measurement and discriminate ability [22],high-resolution angle capability via signal processing [2], andthe robust performance under harsh weather conditions such asweak light, fog, rain, and snow [23]. In the security industry,77 GHz mmWave radars have also been widely applied inperson re-identification [26] and fall activity detection [27]because of its under-controlled privacy concerns and no healthhazards.

In this paper, we apply commercial 77 GHz mmWave radarto the carried object detection problem. Specifically, we narrowdown this problem to detecting the existence of three mainobjects - laptop, phone, and knife - on the pedestrian subject toverify feasibility. To do this, we proposed a deep learning-basedcarried object detection (COD) system that takes raw ADC dataas input and outputs the predicted existence probabilities forthree classes of objects. The COD framework has three mainmodules: preprocessing, single-shot prediction network, andmulti-shot decision. The preprocessing module is responsiblefor detecting targets from raw ADC radar data (I-Q samplespost demodulated at the receiver) and cropping small range-azimuth-elevation (RAE) cubes from generated radar imagingbased on the detection location. The cropped cube depicts the 3-dimension imaging of a pedestrian carrying objects. Second, thesingle-shot prediction module is a pyramidal-feature-hierarchyconvolutional neural network that takes a single cropped cubeas input to make the existence prediction for three classes ofobjects. Here, we use the combination of preprocessing andprediction (or classification) network, instead of the SSD [28]-like end-to-end neural network modeling for object detection,to reduce the network size or complexity and relieve labelingworkload. To further improve system performance, a multi-shotdecision module was designed to track the cropped cubes andmake a final decision based on the voting results of multiplein-track cubes.

For experimenting purposes, a large radar raw data andcamera image dataset for a pedestrian subject with variousopen carry or concealed objects have been collected using theself-built radar-camera testbed. In particular, significant effortwas placed in collecting data for situations where camerasare largely ineffective, i.e. objects hidden or covered withclothes. The system performance is analyzed under differentscenarios to determine the influence of different input, factors,and parameters on the method. The experimental resultsindicate that 77 GHz radar-based COD system performs verywell for openly carried objects, and also works to detectconcealed objects in cases such as low-light environmentsor with substantial clothing or bag occlusion where camera-based detection does not work. This system would be the veryfirst baseline for other future works aiming to detect carried

objects using 77 GHz radar.In summary, the main novel contributions of this paper are

three-fold:• A new deep learning-based carried object detection system

designed for 77 GHz mmWave radar. To the best ofour knowledge, we are the first ones applying the RAEimaging results of a commercial automotive radar on thisproblem in real-world scenes.

• Extensive experiments for validating the system perfor-mance on detecting open carry and concealed objects withself-built testbed and dataset.

• Analysis of the influence of different input, factors, andparameters on system performance, providing an intuitiveexplanation of the system.

The rest of this paper is organized as follows. The principle ofFMCW MIMO radar is introduced in Section II. The proposedCOD system framework is presented in Section III. The systemimplementation details (including testbed and dataset) andevaluation results are described in Section IV and Section V.We discuss and analyze the system performance in Section VI.Finally, a conclusion to the paper and proposed future work.

Fig. 2. (a) FMCW signal; (b) FMCW radar system.

II. PRIMER

A. FMCW Radar

FMCW radar transmits a periodic wideband linear frequency-modulated (LFM, also called chirps) signal as shown inFig. 2(a). The transmitted (TX) signal is reflected from targetsand received at the radar receiver. FMCW radars can detecttargets’ range and velocity from the received (RX) signal usingthe stretch or de-chirping [22] processing structure in Fig. 2(b).The mixer at the receiver multiplies the RX signal with the TXsignal to produce an intermediate frequency (IF) signal. Atthe receiver end, the IF signal is passed into an anti-aliasinglow-pass filter (LPF) and an analog-to-digital converter (ADC)for the following digital signal processing.

Range estimation: Since the RX and the TX signal areboth LFM signals with constant frequency difference and time

Page 3: Learning to Detect Open Carry and Concealed Object with ...

3

Fig. 3. Overview of the proposed carried object system: preprocessing module, and the single-shot prediction network that consists of a backbone, featureconcatenation, and three prediction heads.

delay, the IF signal has a single tone corresponding to thetarget’s distance. For example, the IF frequency for a target atrange r is given by fIF = 2r

c S, the multiplication of round-tripdelay 2r

c and chirp sweeping slope S, where c denotes thespeed of light. Thus, detecting the frequency of the IF signalcan determine the target range. A cost-efficient fast Fouriertransform (FFT) is widely adopted here to estimate fIF, andwe name it Range FFT.

Doppler velocity estimation: Any radial motion ∆r relativeto the radar between consecutive chirps will cause a frequencyshift ∆fIF = 2S∆r

c as well as a phase shift ∆φv = 2πfc2∆rc =

4πvTcλ in IF signal [22], [29], where fc is the carrier frequency,

v is the object velocity, Tc is the chirp period, and λ is thewavelength. Compared to the IF frequency shift, the phaseshift is more sensitive to the object movement [29]. Hence, byestimating the phase shift using FFT (named Velocity FFT)across chirps, we can transform the estimated phase to Dopplervelocity.

Angular estimation: Angle estimation is conducted viaprocessing the signal at a receiver array composed of multipleelements. The return from a target located at far field andangle θ results in the steering vector as the uniform lineararray output [30]:

aULA(θ) = [1, e−j2πd sin θ/λ, · · · , e−j2π(NRx−1)d sin θ/λ]T

where d denotes the inter-element distance. The embeddedphase shift e−j2πd sin θ/λ can be extracted by a FFT (namedAngle FFT) to resolve arrival angles θ [22].

B. MIMO & Virtual Array

The multiple-input and multiple-output (MIMO) radar isefficient in improving angular resolution by forming a virtualarray and increasing valid antenna aperture. This is achievedby sending orthogonal signals on multiple TX antennas, whichenables the contribution of each TX signal to be extracted ateach RX antenna. Hence, a physical TX array with NT elementsand RX array with NR elements will result in a virtual arraywith up to NTNR unique (non-overlapped) virtual elements [31].The virtual array is located at the spatial convolution of TXantennas and RX antennas, i.e., convolution produces a set ofvirtual element locations that are the summed locations of each

TX and RX pair. To reduce array cost (fewer physical antennaelements), non-uniform arrays spanning large apertures havebeen proposed, e.g., the minimum redundancy array (MRA)[23], [32].

When performing angular estimation on a MIMO virtualarray, the motion-induced phase errors (i.e., for non-stationarytargets) should be compensated on virtual elements beforeperforming Angle FFT. The motion-included phase differencehas to be considered under time-division multiplexing (TDM)scheme because of the switching time between the transmitters.The correction of phase error can be done via compensationof half the estimated Doppler phase shift (∆φv/2) obtainedfrom the Velocity FFT results [33].

III. SYSTEM OVERVIEW

The proposed carried object detection (COD) system hasthree main modules - preprocessing, single-shot predictionnetwork, and multi-shot decision. The preprocessing module isresponsible for detecting targets from raw ADC radar data andcropping small range-azimuth-elevation cubes from generatedradar imaging based on detection location. The single-shotprediction network takes a single cropped cube as input to makethe existence prediction for three classes of objects: laptop,phone, and knife. To further improve system performance, themulti-shot decision module tracks the cropped cubes and makesthe final decision based on multiple in-track cubes.

A. Preprocessing

Target detection: The CFAR-based target detection andlocalization algorithm operates on the raw radar I-Q samples,as shown in Fig. 4(a). First, the Range and Velocity FFTs areperformed on I-Q data in a frame to obtain the range-velocity(RV) map for initial target detection. The RV maps from allreceivers are integrated non-coherently (i.e., sum the magnitudeRV maps) to increase the signal-to-noise ratio of the resultingRV map. Post summing, the 1D cell-averaging CFAR [34]algorithm is applied along the Doppler dimension and rangedimension separately to detect targets or peaks and obtain their2D localization (range, velocity). During the CFAR detectionprocess, each cell or bin is evaluated for the presence or absence

Page 4: Learning to Detect Open Carry and Concealed Object with ...

4

Fig. 4. Signal preprocessing block diagram where blue blocks represent theoperation and yellow blocks represent the input or output.

of a target using a threshold that adapts itself according to thenoise power estimated within a sliding window.

Thereafter, peak grouping for all CFAR detections is doneby checking if each detection has a greater amplitude thanits neighbored detections. For example, there is detection #2that lies within the 3 × 3 range-velocity kernel centered atdetection #1, then detection #2 will be discarded if it has asmaller amplitude [2]. Peak grouping is intended to simplycluster the very close peaks or detections. After peak grouping,we estimate azimuth angles for the remaining detections andobtain their final localization in range, velocity, and azimuthangle. This is done by calculating the Angle FFT for eachdetected target across the RV maps of all receivers (i.e., thevirtual array formed by TDM-MIMO). Note that before AngleFFT, we need to compensate the motion-induced phase errorfor TDM-MIMO using the estimated Doppler velocity.

Range-azimuth-elevation estimation: To get a 3D viewof a pedestrian subject and their carried object, the range-azimuth-elevation estimation is implemented for radar imaging.The imaging result for each frame is a 3D map with therange dimension, azimuth angle dimension, and elevation angledimension. As the range estimation has already been donein the target detection part, the remaining azimuth-elevationestimation processing is continued on the Range FFT output.That is, we perform the Angle FFT for azimuth angle estimationand another FFT for elevation estimation. The first FFT isimplemented across all horizontal elements of the 2D virtualarray, while the second FFT is implemented across all verticalvirtual elements. Details of this workflow are illustrated inFig. 4.

Clustering and Cropping: To reduce the size of the input tothe network, we crop small cubes from the generated 3D range-azimuth-elevation map based on the location of detections,and only input the cropped cubes to the following predictionnetwork. To decrease the total number of cropped cubes, weapply clustering for detections with their localization beforethe cropping operation. We define a parameter ε = [εr, εv, εa]that specifies how close points should be to each other onrange, velocity, and azimuth angle dimension to be considered

a part of a cluster. It means that if the distance between twodetections is lower or equal to this value ε, these detectionsare considered neighbors. The center location of the resultingclusters are taken as the new detections, and for each newdetection, crop a cube centered on a specified range, azimuthangle, and zero elevation angle. The cube size is 24× 24× 10for this paper.

B. Single-shot Prediction Network

Backbone: The backbone is a deep residual pyramid neuralnetwork that takes a single cropped range-azimuth-elevationcube as input for feature extraction. The backbone is modifiedfrom ResNet-50 [35] and it has 49 3D convolutional layers.The convolutional layers mostly have 3 × 3 × 3 filters and1 × 1 × 1 filters and they are mainly divided into 5 parts(conv1 x, conv2 x, conv3 x, conv4 x, and conv5 x) shownin Table. I. We perform downsampling at the end of the lastfour parts directly by convolutional layers with a stride of 2.

The last four parts have 3, 4, 6, 3 three-layer bottleneckblocks respectively for performing residual function by shortcutconnection [35]. That is, for each residual block, we use a stackof 3 layers that are 1×1×1, 3×3×3, and 1×1×1 convolutions,where the 1×1×1 layers are responsible for reducing and thenincreasing (restoring) dimensions, leaving the 3 × 3 × 3 layera bottleneck with smaller input and output dimensions. Oneresidual block example is presented in Fig. 5(a). The shortcutconnection between input and output is implemented by theidentification mapping if the dimensions are equal, or by linearprojection to match dimensions if not equal.

TABLE IDETAILS OF BACKBONE LAYERS.

Layer name conv1 xFilters 3× 3× 3

Output size 24× 24× 10

Layer name conv2 x conv3 x

Filters

1× 1× 1, 64

3× 3× 3, 64

1× 1× 1, 256

× 3

1× 1× 1, 128

3× 3× 3, 128

1× 1× 1, 512

× 4

Output size 12× 12× 5 6× 6× 3

Layer name conv4 x conv5 x

Filters

1× 1× 1, 256

3× 3× 3, 256

1× 1× 1, 1024

× 6

1× 1× 1, 512

3× 3× 3, 512

1× 1× 1, 2048

× 3

Output size 3× 3× 2 2× 2× 1

Feature concatenation: A deep convolutional networkcomputes a feature hierarchy layer by layer, and with subsam-pling layers, the feature hierarchy has an inherent multi-scaleand pyramidal shape [36]. This in-network feature hierarchyproduces feature maps of different spatial resolutions, andintroduces large semantic gaps caused by different depths.We reuse the multi-scale feature maps from different layerscomputed in the forward pass that comes free of cost, asillustrated in Fig. 3. The multi-scale feature maps go throughmax-pooling layers to reduce output size to 1×1×1 and extractthe sharp patterns [35]. Then we flatten and concatenate multi-scale feature maps together to form the multi-scale features(1 × 3840 size). Besides, we pass the center location of the

Page 5: Learning to Detect Open Carry and Concealed Object with ...

5

Fig. 5. (a) Example of residual block. (b) FFN prediction head.

cropped cube (i.e., range and azimuth angle) to a 3-layer feed-forward neural (FFN) for extracting location features (1 × 64size) and concatenate the location features and multi-scalefeatures together as the final feature (1 × 3904 size).

Prediction heads: The concatenated features are input tothree same prediction heads to infer if there exists a carriedlaptop, phone, or knife in the input cropped cube. Eachprediction head is a 5-layer FFN that makes the existenceprediction for a class. For example, if the third FFN is incharge of observing knives and it will output the probabilityof carrying a knife p. We then can simply check if p > pthrto make the single-shot prediction, where pthr is a probabilitythreshold. The FFN used here has 1 input layer, 3 hiddenlayers, and 1 output layer as shown in Fig. 5(b). The last layerhas 2 outputs (o1, o2), which are transformed to the predictionprobability p using the softmax function.

p =exp(o1)

exp(o1) + exp(o2)

It is worth to be noted here that we adopt 3 independentbinary prediction heads instead of using one 3-class predictionhead. The reason behind it is that the 3-class prediction headdoesn’t allow the coexistence of more than one object, whichis unreliable for most of practical scenarios.

Loss function: The loss function for this proposed neuralnetwork is the weighted sum of Focal Losses [37] from threeprediction heads. Focal Loss is adopted here to address classimbalance during training, i.e., for each binary prediction head,the number of a certain object (e.g., laptop) is naturally less thanthe total number of non-objects (e.g., phone and knife). Focalloss applies a modulating term (1 − p)α to the cross-entropyloss in order to focus learning on hard negative examples [37].

Loss = FLlaptop + FLphone + FLknife

FL(p) = −w1y(1 − p)α log p− w2(1 − y)pα log(1 − p)

where y = 0 or 1 is the ground truth and p is the predictedexistence probability for a certain class of objects. α is atunable focusing parameter, w1 and w2 are weight-balanceparameters.

C. Multi-shot Decision

We have mentioned the single-shot prediction network in Sec-tion III-B with one cropped cube as input. To further improve

Fig. 6. Explanation of multi-shot decision policy for detecting knife object.

detection performance by introducing different observationperspectives from multiple frames, a simple multi-shot decisionscheme is proposed - to track the cropped cubes and makevoting decision based on multiple in-track cubes.

Tracking cropped cubes: : A Kalman filter operating onsubsequent frames is applied to obtain a reliable estimationof the true subject’s state (i.e., its location). The associationof the cropped cubes detected in the current time frame withthe right user trajectories is performed using the Hungarianalgorithm.

Decision policy: Assume we have N in-track cropped cubesthat belong to same pedestrian carrying an object and inputthem separately to the single-shot prediction network to obtainN independent prediction probability p1, p2, . . . , pN . Howto make the final decision based on them to achieve morecoherent prediction results and therefore better performance?

The proposed method is a simple voting policy that measuresaverage probability over N predictions and checks if averagedp is greater than probability threshold pthr. The threshold istunable for satisfying the system requirement on false alarmand sensitivity.

1

N

N∑i=1

pi > pthr

IV. IMPLEMENTATION

A. Testbed

The experiment test-bed (Fig. 7 left) was assembled witha TIDEP-01012 77 GHz mmWave radar [38] and binocularFLIR cameras (left and right) that provide the visualizationfor experiment scenarios. The binocular cameras and radar areconnected to the same laptop which is used to timestamp tokeep inter-sensor synchronization. The synchronization betweenthe two cameras is achieved by joining them together withan additional cable and using the same trigger clock. Theradar data collection pipeline is implemented by combiningMATLAB scripts and TI software development kits (SDK)while the camera pipeline is implemented by Python scriptsand FLIR SDK.

The adopted mmWave radar is a 4-chip cascaded evaluationboard with 12 TX antennas and 16 RX antennas (Fig. 7 right).With time-division multiplexing (TDM) on TXs, it can forma large 2D-MIMO virtual array (Fig. 8) with 192 elementsvia the spatial convolution of all TX and RX, resulting in fineazimuth resolution (1.35°) and additional elevation resolution

Page 6: Learning to Detect Open Carry and Concealed Object with ...

6

Fig. 7. Experimental radar-camera testbed (left) and the TIDEP-01012 cascaded-chip mmWave radar (right).

Fig. 8. The formed 2D virtual array of the adopted radar with TDM-MIMOconfiguration.

(19°). The configuration of this radar is presented in Table. II.Based on those parameters and the calculation equations in [22],we can give out the capability of this radar in terms of rangeresolution ( c

2B = 0.06 m), max detectable range ( fsc2S = 15 m),

Doppler velocity resolution ( λ2NcTc

= 0.072 m/s), and maxoperating velocity ( λ

4Tc= 1.80 m/s).

B. Data Collection and Dataset

Four main object groups were used during the data collectionprocess. The objects that were used were phones, laptops,knives (include butter knives or cutting knives), and others(e.g., keys). The data was collected in the building lobby andlaboratory room with different device placement locationsevery time. After the calibration process is complete, the datacollection process can begin. A single data collection runconsisted of a subject holding one of the four object groupslisted above, and the subject would walk for 10 seconds infront of the testbed while either concealing or openly carryingthe object. The testbed would capture 300 frames of cameraimages and radar raw ADC data at 30 frames per second. Toadd variability to the data, the walking pattern was alwaysrandomized. Additionally, the location of where the objectswere concealed or how the objects were openly carried wasalso changing.

The data consisted of single object being open carry andsingle object being concealed carried. The subjects performed

TABLE IICONFIGURATION FOR ADOPTED MMWAVE RADAR

Configuration Value

Frequency (fc) 77GHz

Sweeping Bandwidth (B) 2.5GHz

Sweep slope (S) 79MHz/µsSampling frequency (fs) 8Msps

Num of chirps in one frame (Nc) 50

Num of samples of one chirp (Ns) 256

Duration of chirp 1 and frame (Tc, Tf) 540 µs, 1/30 s

1 Tc is equal to single chirp interval times number of TXantennas, i.e., Tc = 45us× 12 = 540 us.

the data runs with different clothing types - low, medium, andheavy - which corresponds to the thickness of the clothing.For example, a t-shirt would be considered low, while a jacketon top of another layer would be heavy. A total of 196500frames were collected for a single subject with a single object,99300 of those were open carry and 97200 were concealed.The detailed class distribution and location distribution for thecollected dataset are described in Fig. 9.

C. Preprocessing

The preprocessing was conducted using MATLAB R2020bon a computer with Intel i7-7700K CPU to detect potentialtargets, cropped the range-azimuth-elevation cubes from thegenerated 3D radar imaging. During preprocessing, the usedhyper-parameters are summarized here: probability of falsealarm in CFAR (1 × 10−4), Range FFT points (256), VelocityFFT points (64), Angle FFT points for azimuth (86), Angle FFTpoints for elevation (16), clustering threshold ([εr, εv, εa] =[10, 8, 8]), cropped cube size along range, azimuth and elevation(24, 24, 10), and cube amplitude normalization value (1×105).The cropped and normalized cubes are stored at the local PCfor following training and testing usage. Moreover, for thein-track data used in multi-shot prediction, the Kalman filterand Hungarian algorithm were used to track the cubes frommultiple frames and we saved the tracking association resultsto the local PC.

Page 7: Learning to Detect Open Carry and Concealed Object with ...

7

Fig. 9. Dataset distribution: (a) Data distribution of single open carry objectscenario; (b) Data distribution of single concealed object scenario; (c) Datadistribution of 8 experiment palaces.

D. Training

A two-step training strategy was used here, that is, a smallsubset of the training data is used first for training the networkto get a pre-trained model, and the pre-trained model is used asinitialization for the second training with a complete trainingset. The two-step training is to address the problem we foundwhere the training loss is easily stuck at some points with alarge training set initially. The first training step starts withlearning rate 4 × 10−4 and stops when the training accuracyapproaches 90%, while the second training step starts withsmaller learning rate 1 × 10−4. Besides, a batch size of 32,SGD optimizer, and the learning rate decayed by half every10 epochs are used in both two steps.

We train and test the network on the open carry dataand concealed dataset separately. For the parameters inloss function, we use a fixed focusing value α = 2for these two dataset but use different balance weights[w1,laptop, w2,laptop, w1,phone, w2,phone, w1,knife, w2,knife]. For opencarry and concealed dataset, the fine-tuned weights are asfollows [1, 1, 20, 1, 1, 1], [2, 1, 20, 1, 1, 1]. In addition, we notethat the training RAE cubes are generated from a randomchirp of frame, aiming to expand training set without any cost.While for testing cubes, we average the imaging results of allchirps within frame to reduce noise and thus improve theirsignal-to-noise ration.

V. EVALUATION

A. Metrics

Five metrics are adopted for evaluating the effectiveness ofour system: precision, recall, false alarm (rate), missing (rate),and F1. They are defined below using true positive (TP), truenegative (TN), false positive (FP), and false negative (FN)detection. Here, precision means the proportion of positiveidentifications is actually correct, recall means the proportion

of actual positives was identified correctly, and F1 is a measurethat combines precision and recall. False alarm represents theprobability of falsely detecting something when it does notexist, while missing rate represents the probability of falselyignoring something when it actually exists.

precision =TP

TP + FP, recall =

TPTP + FN

, missing = 1−recall

false alarm =FP

FP + TN, F1 = 2 · precision · recall

precision + recall

In our application, the repercussion of making a missingerror is much more severe than making a false alarm error.For example, we aim to make fewer mistakes in detecting nogun when a concealed gun is present, rather than detectingsome other object as a gun. Since detecting a regular personerroneously as a gun holder results in a small check to keeppeople safe while letting an actual gun holder go undetectedmight lead to catastrophic outcomes.

B. Single-shot Prediction Results

After training a single-shot prediction network in the CODsystem with the open carry dataset and concealed dataset, theperformance of the system is tested using detection thresholdpthr = 0.5 and the metrics stated above. With the single-framecropped cube as input, this method is represented as COD-single and we shows the evaluation results in the first sixcolumns of Table. III. Overall, the COD-single method performsbest to detect an open carry laptop with a false alarm of19.48% and a missing of 14.2%. The system also performs wellwith detecting open carry phones and knives, with false alarmpercentages of 24.46% and 27.69% respectively and missingpercentages of 25.93% and 26.04% respectively. However,when it comes to concealed objects, all metrics present a lesserperformance. When concealed, the system on average missesdetecting the object about 8% more often, which can proveto be harder for the system to get a concealed knife or gundetected. In terms of precision, false alarm, and F1, the systemperformance as a whole declines by about 12-17% when theobject is concealed compared to the open carry case. That isprobably because the concealed case introduces more variationsregarding object position or perspectives and cover materials(e.g., clothes, bags).

C. Multi-shot Decision Results

With the trained model of a single-shot prediction network,the system performance of the multi-shot decision module isevaluated here by using 10 in-track cubes corresponding to thesame pedestrian subject. The system with a multi-shot decisionis named COD-multi. The metrics are then recalculated for thisnew system with same threshold 0.5 and the results are shownin the last five columns of Table. III. From the results, addinga multi-shot decision module improves the performance of thesystem for both concealed and open carry objects. Particularly,missing and false alarm rates decreased by approximately 5-8%, while on the other hand precision, recall, and F1 increasedby roughly the same amount. Decreasing false alarms, missing,and improving precision etc. are important to establish an

Page 8: Learning to Detect Open Carry and Concealed Object with ...

8

TABLE IIIEVALUATION RESULTS FOR OPEN CARRY AND CONCEALED OBJECT USING COD-SINGLE AND COD-MULTI METHOD.

Object Method Metric Method Metric

COD-single precision recall (missing) false alarm F1 COD-multi precision recall (missing) false alarm F1open carry:(a) laptop 0.6146 0.858 (0.142) 0.1948 0.716 0.6637 0.9132 (0.0868) 0.1681 0.7684(b) phone 0.5059 0.7407 (0.2593) 0.2446 0.5995 0.5618 0.78 (0.22) 0.2065 0.6511(c) knife 0.7036 0.7396 (0.2604) 0.2769 0.7211 0.7645 0.809 (0.191) 0.2217 0.7861(d) average 0.6081 0.7794 (0.2206) 0.2388 0.6789 0.6633 0.8341 (0.1659) 0.1988 0.7352

concealed:(a) laptop 0.4610 0.773 (0.277) 0.3749 0.5628 0.5305 0.8114 (0.1886) 0.3191 0.6411(b) phone 0.3256 0.6915 (0.3085) 0.3087 0.442 0.3743 0.7613 (0.2387) 0.2761 0.5005(c) knife 0.5918 0.6591 (0.3409) 0.4462 0.6235 0.6256 0.7119 (0.2881) 0.4201 0.6657(d) average 0.4595 0.6912 (0.3088) 0.3766 0.5428 0.5101 0.7615 (0.2385) 0.3384 0.6024

Fig. 10. Subject samples with detection probability results, note that we plot the bounding box manually based on detection results and ignore those withprobability less than 0.5. First row are all open carry single-object: (a) subject carrying phone in hand; (b) subject carrying knife in hand; (c) subject carryinglaptop in front of body; (d) subject carrying laptop by their side; (e) subject carrying phone by their side. Second row are all concealed single-object: (f)subject carrying knife inside a pocket; (g) subject carrying knife inside a backpack; (h) subject carrying a laptop in a backpack; (i) subject carrying phoneinside a pocket; (j) subject carrying knife in a backpack.

effective system, and providing a multi-frame cube input doesjust that. On average, for openly carried objects, the precisionincreased by 6%, recall increased by 6%, and missing wentdown by 6%. Even when the object is concealed, the system’sperformance improved by about 7% across the board whencomparing results to COD-single.

D. Qualitative Results

Fig. 10 above shows testing samples of the system activelyworking. It shows multiple images of subjects holding orcarrying objects and the detection result of the system. Theprobability calculated by the system to determine which objectsit detects is printed onto the image with different colorsrepresenting different objects. In the first row, all the objectsare openly carried, while all the images in the second roware of objects that are concealed. The figure shows a glimpseof the diversity of the data that was collected, in which theobject was placed in different locations, such as a pocket,backpack, held to the side, or held in front, in addition torandomized walking patterns. Given those samples with diversesituations, our system adapts well and is able to detect theobject successfully by outputting correct probability larger than

0.5. On the whole, the detection probability of concealed objectis a little bit smaller than that of open carry object, whichagrees with above quantitive evaluation results.

VI. DISCUSSION

A. Influence of Input Data

The selection of input data has a big impact on the systemperformance. Here, besides the range-azimuth-elevation (RAE)cube input, two more input formats are considered: range-azimuth (RA) cube and RAE-Zoom cube.

• RA cube: RA is simply removing the elevation dimensionfrom the data, keeping the range and azimuth angledimension. By comparing the performance between RAcube and RAE cube input, a conclusion can be made aboutwhether there is a benefit from the additional elevationdimension. To generate the RA cube, the Elevation FFToperation is removed from the preprocessing module inFig. 4. Instead, we concatenate the Angle FFT processingresults of different vertical RXs directly along a newdimension (i.e., similar to RGB channels). Since the inputdimension is reduced, the 3D convolution layers in asingle-shot prediction network are accordingly replaced

Page 9: Learning to Detect Open Carry and Concealed Object with ...

9

Fig. 11. Performance comparison for RAE, RAE-Zoom, and RA input. From left to right and up to down, four bar charts depict the precision, missing rate,false alarm rate, and F1 for open carry object detection using COD-single model, respectively.

Fig. 12. Evaluation of detecting objects at different distances with COD-single model. From left to right, three subfigures present the precision, missing rate,and false alarm rate changes against distance, respectively.

with 2D ones. This new model is trained from scratchsimilar to the method described in Section IV-D.

• RAE-Zoom cube: RAE-Zoom takes an extra step ofadjusting the RAE input data by calibrating the input toa similar (cross-range) coverage plane. It is to manuallyhandle the issues from polar-coordinate radar imagingwhere objects look small in the distance and big on thecontrary. We solve this by a zooming in (out) operation thatprojects radar data to a fixed cross-range plane and thenmakes interpolation. The generated data is called RAE-Zoom cube, which will be input to the origin network fortraining and evaluation.

Using the metrics defined in Section V-A, the performanceof the COD-single method was re-evaluated using RA cubeinput and RAE-Zoom cube input. Fig. 11 shows the results ofthose computations for open carry object detection using allthree different input types mentioned earlier. When comparingthe results of RAE-Zoom to that of RAE input, the numbers

are very similar to the point where the zooming operation hasno major impact on the results. Not including the zoomingwould be more beneficial since it will remove an extra step ofcomputation when determining a presence of danger resultingin a faster response from the system. Comparing the results ofRAE and RAE-Zoom to RA, the performance of the systemis worse across all four metrics. On average, RA providesthe lowest precision and F1, and the highest false alarmand missing rates. Based on these results, the additionalelevation information in RAE and RAE-Zoom shows thecapability of improving system performance across the board.Besides, the minimality of the improvement provided by thezooming operation in RAE-Zoom is outweighed by the speedof operating without it, which proves that additional zoomingis unworthy and the proposed network can handle this variationinherently.

Page 10: Learning to Detect Open Carry and Concealed Object with ...

10

B. Influence of Distance and Occlusion

System performance for different distance: When detect-ing an object, distance plays an important role in determiningthe reflection amplitude of the object or subject thus affectingthe effectiveness of the system. To evaluate how well the systemworks for different distances, the objects are divided into fourgroups - 0-2 m, 2-4 m, 4-6 m, and 6-8 m - according to theirmeasured ranges. Fig. 12(a) shows the precision, missing, andfalse alarm rates against the distance across six classes ofobjects: concealed and open carry phones, knives, and laptops.In Fig. 12(a), the plot for precision shows that on averagethe further away the object is, the lower the precision of thesystem. When an object is close to the radar it has a muchlarger reflection, which tends to make it easier for the systemto detect, however being too close can lower the precision ofthe system. Meaning that there might be an optimal distanceat which the system is most precise. Based on the plots inFig. 12(a) the optimal distance for all the objects as a collectiveis between 2 to 6 meters.

Fig. 12(b) and 12(c) show the plot for missing and falsealarm rates with respect to the distance of the object. The plotsreveal that the missing rate increases and false alarm decreasesat distances greater than 6 meters except for open carry andconcealed laptops. This is probably due to other objects havinga smaller size in comparison to laptops, which drives the systemto ignore them when further away. To get a better trade-offof false alarm and missing rate for long-distance objects, thedetection threshold pthr can be adjusted accordingly.

TABLE IVEVALUATION OF DETECTING CONCEALED KNIFE WITH COD-SINGLE

MODEL UNDER DIFFERENT OCCLUSION CONDITION.

Object Method Metric

COD-single recall (missing) F1concealed knife:(a) in pocket (training with 0.5038 (0.4962) 0.6691(b) in bag 1 all open carry 0.3637 (0.6363) 0.5331(c) in bag 2 data) 0.39 (0.61) 0.5608concealed knife:(a) in pocket (training with 0.6947 (0.3053) 0.8196(b) in bag 1 all concealed 0.6333 (0.3667) 0.7754(c) in bag 2 data) 0.7282 (0.2718) 0.8423

System performance for different occlusion: An importantobservation to point towards is that the evaluation results of thesystem can differ based on the concealment condition of theobject. Table. IV shows the evaluation results for two differentexperiments. In the first experiment, the training data consistedof all open carry objects, while in the second experiment thetraining data were all concealed objects. The testing data forboth cases were concealed knives, which were further dividedinto three groups: knife in pocket, knife in bag 1, and knifein bag 2. Note that the precision and false alarm metric areomitted here since the test object being observed was onlyconcealed knife thus precision is always 1 and false alarmmakes no sense. Besides, F1 value here is greater than that inTable. III because of the large precision in calculating F1.

From the comparison between two experiments, the systemperformed clearly much better when trained with concealed

Fig. 13. Evaluation of detecting concealed knife with COD-multi model usingvarying-length input frames.

data with an increase in recall by about 20% or more forall three groups. This is due to the fact that in the presenceof other cover materials like a backpack, the radar receivesextra reflections from them which may confuse the systemonly trained with open carry data. Therefore, training thesystem with concealed object data is important to maintain highdetection performance. From Table. IV, even when trainingwith all concealed data, the recall differs depending on themanner the object was concealed. For example, the systemperformed better when detecting knife in pocket than bag 1.This stresses the importance of diversifying the training data setto include multiple different concealment methods to improvegeneralization and performance.

C. Performance Trade-off by Adjusting Parameter Values

Number of input frames N : The number of frames usedin the multi-shot decision module directly correlates with theefficacy of the system. To show it, we evaluate the performanceof detecting concealed knives with varying-length croppedcubes input and plot the results in Fig. 13. It illustrates thatusing more frames increases the precision and F1, while alsodecreasing the false alarm and missing rates before they getsaturated. However, using too many frames can be detrimentalto the objective of our system, which is to provide securityin real-time. Increasing the number of frames in the detectionprocess will increase the required time to run the algorithm,hence delaying the result, and that delay could prove to befatal in a worst-case scenario [39] [40]. Given the importanceof real-time running, the plot and Table. V help to select theideal number of frames to improve the accuracy of the system,while maintaining an acceptable operating time. In this case,the selected ideal number of input frames is 10.

Detection probability threshold pthr: This threshold isplaced at the end of the system in order to identify a detectionwith a prediction probability greater than the threshold. Varyingthe detection threshold causes a performance trade-off betweenfalse alarm rate and missing rate as shown in Fig. 14, where twomissing-against-false alarm curves are plotted for concealed and

Page 11: Learning to Detect Open Carry and Concealed Object with ...

11

TABLE VRUN TIME MEASUREMENT FOR VARYING-LENGTH INPUT

1-frame input 10-frame input

run time per output 23.47ms 215.63ms

Fig. 14. Performance trade-off between false alarm and missing rate withvarying detection probability threshold.

open carry knives respectively by testing COD-single modelwith different threshold values. From Fig. 14, it is easy tofind a threshold to minimize missing rate with the price ofincreasing false alarm rate (vice versa). Aiming to achievethe best balance, we select the inflection point in both curveswhich provides the lowest combination of false alarm rate andmissing rate. For both curves, that occurs at the threshold of0.5.

D. Strength and Limitation

The proposed 77 GHz mmWave radar-based COD systemcan detect three classes of objects - laptop, phone, and knife- under both open carry and concealed cases. Comparedto the camera-based security system, it works in the low-light and object blocking or occlusion scenarios and comeswithout any privacy concerns. When compared to currentsecurity inspection techniques, e.g., X-ray, mmWave imaging,it lowers the requirements for taking an image with fixedposture or position and greatly reduces the processing time.Our system is capable of generating real-time object detectionoutput, even when the subject pedestrian is moving, usingproper signal preprocessing and deep learning. The timeefficiency improvement enables more flexibility and higherpassenger throughput rates, however, may also cause accuracyloss as a trade-off. From the observation that the multi-shotdecision module brings complete performance improvement, itis promising to further explore methods of utilizing multipleframes in the future.

For limitation, the proposed COD system requires the sensorand target to be in relatively close proximity (e.g., < 6 m) tokeep good detection capability, just like all other approaches.Besides, the experiments are limited to the single carried object

situation in this paper, which should be further extended tomore complicated scenarios, e.g., a subject holding multipleobjects, multiple subjects holding different objects. Due to thedifficulty of capturing data in such circumstances shortly, thismust be left for future work.

VII. CONCLUSION & FUTURE WORK

In this paper, we focused on the relatively unexplored area ofusing low-cost 77 GHz mmWave radar for the carried objectsdetection problem. The proposed COD system is capable ofreal-time detection of three classes of objects - laptop, phone,and knife - under open carry cases and concealed cases whereobjects are hidden with clothes or bags. This system wouldbe the very first baseline for other future works aiming todetect carried objects using 77 GHz radar. For future work,we plan to do more experiments and evaluations for testingsystem efficiency and continue to explore methods of utilizingmultiple frames to improve performance.

REFERENCES

[1] D. A. Robertson, D. G. Macfarlane, and T. Bryllert, “220GHz wideband3D imaging radar for concealed object detection technology developmentand phenomenology studies,” in Passive and Active Millimeter-WaveImaging XIX, D. A. Wikner and A. R. Luukanen, Eds., vol. 9830,International Society for Optics and Photonics. SPIE, 2016, pp. 55 –62.

[2] X. Gao, S. Roy, and G. Xing, “Mimo-sar: A hierarchical high-resolutionimaging algorithm for mmwave fmcw radar in autonomous driving,” IEEETransactions on Vehicular Technology, vol. 70, no. 8, pp. 7322–7334,2021.

[3] M. T. Bhatti, M. G. Khan, M. Aslam, and M. J. Fiaz, “Weapon detectionin real-time cctv videos using deep learning,” IEEE Access, vol. 9, pp.34 366–34 382, 2021.

[4] Z. A. Ali, S. Narejo, D. Pandey, Bishwajeet andEsenarro vargas,C. Rodriguez, and M. R. Anjum, “Weapon detection using yolo v3for smart surveillance system,” 2021, pp. 193–198.

[5] X. Gao, G. Xing, S. Roy, and H. Liu, “Ramp-cnn: A novel neural networkfor enhanced automotive radar object recognition,” IEEE Sensors Journal,vol. 21, no. 4, pp. 5119–5132, 2021.

[6] A. Agurto, Y. Li, G. Y. Tian, N. Bowring, and S. Lockwood, “A reviewof concealed weapon detection and research in perspective,” in 2007IEEE International Conference on Networking, Sensing and Control,2007, pp. 443–448.

[7] N. Wilde, S. Niederhaus, H. Lam, and C. Lum, “Handheld ultrasonicconcealed weapon detector,” in Sensors, and Command, Control, Com-munications, and Intelligence (C3I) Technologies for Homeland Defenseand Law Enforcement, E. M. Carapezza, Ed., vol. 4708, InternationalSociety for Optics and Photonics. SPIE, 2002, pp. 122 – 127.

[8] Z. Zhang, X. Di, Y. Xu, and J. Tian, “Concealed dangerous objectdetection based on a 77ghz radar,” in 2018 IEEE International Workshopon Electromagnetics:Applications and Student Innovation Competition(iWEM), 2018, pp. 1–2.

[9] K. B. Cooper, R. J. Dengler, N. Llombart, T. Bryllert, G. Chattopadhyay,E. Schlecht, J. Gill, C. Lee, A. Skalare, I. Mehdi, and P. H. Siegel,“Penetrating 3-d imaging at 4- and 25-m range using a submillimeter-wave radar,” IEEE Transactions on Microwave Theory and Techniques,vol. 56, no. 12, pp. 2771–2778, 2008.

[10] P. Varshney, H.-M. Chen, L. Ramac, M. Uner, D. Ferris, and M. Alford,“Registration and fusion of infrared and millimeter wave images for con-cealed weapon detection,” in Proceedings 1999 International Conferenceon Image Processing (Cat. 99CH36348), vol. 3, 1999, pp. 532–536 vol.3.

[11] P. S. K. Bandyopadhyay, B. Datta, and S. Roy, “Identifications ofconcealed weapon in a human body,” 2012, available at https://arxiv.org/ftp/arxiv/papers/1210/1210.5653.pdf.

[12] D. M. M. Roomi and R.Rajashankari, “Detection of concealed weaponsin x-ray images using fuzzy k-nn,” International Journal of ComputerScience, Engineering and Information Technology (IJCSEIT), vol. 2,no. 2, pp. 187–196, Apr. 2012.

Page 12: Learning to Detect Open Carry and Concealed Object with ...

12

[13] L. Carrer and A. G. Yarovoy, “Concealed weapon detection using uwb 3-d radar imaging and automatic target recognition,” in The 8th EuropeanConference on Antennas and Propagation (EuCAP 2014), 2014, pp.2786–2790.

[14] S. Yeom, D.-S. Lee, J.-Y. Son, and S.-H. Kim, “Concealed object detec-tion using passive millimeter wave imaging,” in 2010 4th InternationalUniversal Communication Symposium, 2010, pp. 383–386.

[15] J. Liu, K. Zhang, Z. Sun, Q. Wu, W. He, and H. Wang, “Concealedobject detection and recognition system based on millimeter wave fmcwradar,” Applied Sciences, vol. 11, no. 19, 2021.

[16] X. Zhuge and A. G. Yarovoy, “A sparse aperture mimo-sar-based uwbimaging system for concealed weapon detection,” IEEE Transactions onGeoscience and Remote Sensing, vol. 49, no. 1, pp. 509–518, 2011.

[17] X. Wang, S. Gou, X. Wang, Y. Zhao, and L. Zhang, “Patch-basedgaussian mixture model for concealed object detection in millimeter-wave images,” in TENCON 2018 - 2018 IEEE Region 10 Conference,2018, pp. 2522–2527.

[18] T. Liu, Y. Zhao, Y. Wei, Y. Zhao, and S. Wei, “Concealed object detectionfor activate millimeter wave image,” IEEE Transactions on IndustrialElectronics, vol. 66, no. 12, pp. 9909–9917, 2019.

[19] C. Wang, J. Shi, Z. Zhou, L. Li, Y. Zhou, and X. Yang, “Concealed objectdetection for millimeter-wave images with normalized accumulation map,”IEEE Sensors Journal, vol. 21, no. 5, pp. 6468–6475, 2021.

[20] H. Essen, H. Fuchs, M. Hagelen, S. Stanko, D. Notel, S. Erukulla, J. Huck,M. Schlechtweg, and A. Tessmann, “Concealed weapon detection withactive and passive millimeterwave sensors, two approaches,” 2006.

[21] O. Martınez, L. Ferraz, X. Binefa, I. Gomez, and C. Dorronsoro,“Concealed object detection and segmentation over millimetric wavesimages,” in 2010 IEEE Computer Society Conference on Computer Visionand Pattern Recognition - Workshops, 2010, pp. 31–37.

[22] X. Gao, G. Xing, S. Roy, and H. Liu, “Experiments with mmwaveautomotive radar test-bed,” in 2019 53rd Asilomar Conference on Signals,Systems, and Computers, 2019, pp. 1–6.

[23] X. Gao, S. Roy, G. Xing, and S. Jin, “Perception through 2d-mimo fmcwautomotive radar under adverse weather,” in 2021 IEEE InternationalConference on Autonomous Systems (ICAS), 2021, pp. 1–5.

[24] Y. Li, Z. Peng, R. Pal, and C. Li, “Potential active shooter detectionbased on radar micro-doppler and range-doppler analysis using artificialneural network,” IEEE Sensors Journal, vol. 19, no. 3, pp. 1052–1063,2019.

[25] X. Gao, S. Ding, K. Vanas, H. R. Dasari, H. Soderlund, J. Yuan,and X. Zhang, “Deformable radar polygon: A lightweight occupancyrepresentation for short-range collision avoidance,” 2021.

[26] Y. Cheng and Y. Liu, “Person reidentification based on automotive radar

point clouds,” IEEE Transactions on Geoscience and Remote Sensing,pp. 1–13, 2021.

[27] B. Y. Su, K. C. Ho, M. J. Rantz, and M. Skubic, “Doppler radar fallactivity detection using the wavelet transform,” IEEE Transactions onBiomedical Engineering, vol. 62, no. 3, pp. 865–875, 2015.

[28] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.Berg, “Ssd: Single shot multibox detector,” Lecture Notes in ComputerScience, p. 21–37, 2016.

[29] C. Iovescu and S. Rao, White paper: The Fundamentals of MillimeterWave Sensors. Texas Instrument, 2017, no. SPYY005, available athttps://www.ti.com/lit/wp/spyy005a/spyy005a.pdf.

[30] H. Krim and M. Viberg, “Two decades of array signal processing research:the parametric approach,” IEEE Signal Processing Magazine, vol. 13,no. 4, pp. 67–94, 1996.

[31] W. Wang, “Virtual antenna array analysis for mimo synthetic apertureradars,” International Journal of Antennas and Propagation, vol. 2012,pp. 1–10, 2012.

[32] A. Moffet, “Minimum-redundancy linear arrays,” IEEE Transactions onAntennas and Propagation, vol. 16, no. 2, pp. 172–175, 1968.

[33] J. Bechter, F. Roos, and C. Waldschmidt, “Compensation of motion-induced phase errors in tdm mimo radars,” IEEE Microwave and WirelessComponents Letters, vol. 27, no. 12, pp. 1164–1166, 2017.

[34] M. A. Richards, Ed., Principles of Modern Radar: Basic principles,ser. Radar, Sonar and amp; Navigation. Institution of Engineering andTechnology, 2010, available at https://digital-library.theiet.org/content/books/ra/sbra021e.

[35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for imagerecognition,” in 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2016, pp. 770–778.

[36] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie,“Feature pyramid networks for object detection,” in 2017 IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944.

[37] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal lossfor dense object detection,” in Proceedings of the IEEE InternationalConference on Computer Vision (ICCV), Oct 2017.

[38] T. Instrument, White paper: Imaging Radar Using Cascaded mmWaveSensor Reference Design. Texas Instrument, 2019, no. TIDUEN5A,available at https://www.ti.com/lit/ug/tiduen5a/tiduen5a.pdf.

[39] M. Kowalski, “Real-time concealed object detection and recognition inpassive imaging at 250 ghz,” Appl. Opt., vol. 58, no. 12, pp. 3134–3140,Apr 2019.

[40] L. Pang, H. Liu, Y. Chen, and J. Miao, “Real-time concealed objectdetection from passive millimeter wave images based on the yolov3algorithm,” Sensors, vol. 20, no. 6, 2020.