Evaluation of Non-Cooperative Sensors for Sense and Avoid ...

Evaluation of Non-Cooperative Sensors for Sense andAvoid in UAV Systems

Pablo Arroyo Izquierdo

Thesis to obtain the Master of Science Degree in

Aerospace Engineering

Supervisors : Prof. Afzal SulemanProf. Humayun Kabir

Examination Committee

Chairperson: Prof. Fernando Jose Parracho LauSupervisor: Prof. Afzal Suleman

Member of the Committee: Prof. Francisco Fernandes Castro Rego

September 2019

A million dreams for the world we’re gonna make

i

Acknowledgements

In first place, I would like to thank professor Suleman for allowing me to be in Victoria conducting my

thesis, as well as to the rest of the team I’ve shared these six months with. A special word to my research

partners Antonio and Luıs and to Kabir, who was always asking the right questions. But this thesis is

only possible thanks to those people that have help me out during all these years.

Thanks to all those teachers during the school and high school that encouraged me to become the

person who I am today.

Thanks to those classmates who became real friends. Special thanks to V, S, L and M.

Thanks to all those people I’ve met with whom I’ve shared special moments. You know who you are.

Thanks to those unforgettable experiences I’ve lived but also thanks to the mistakes I’ve made.

Thanks to the cinema, thanks to the music and, of course, thanks to the musical movies.

Y gracias a mi familia, especialmente a mis padres, quienes estaran ahı sin importar lo lejos que

este.

iii

Abstract

The number of civil and military applications for Unmanned Aircraft Vehicles (UAV) is increasing in the

last years, such as firefighters, search-and-rescue missions or package delivery, among others. Due to

their ability of performing a large variety of important tasks with higher manoeuvrability, longer endurance

and less risk to human lives, small UAV are particularly suitable.

But to carry out these tasks, it is mandatory to guarantee a safe performance and a correct integration

into non-segregated airspace. Integrating unmanned aircraft into civil airspace requires the development

and certification of systems for sensing and avoiding (SAA) other aircraft. In particular, non-cooperative

Collision Detection and Resolution (CD&R) for UAV is considered as one of the major challenges to be

addressed.

The new project Enhanced Guidance, Navigation and Control for Autonomous Air Systems based

on Deep Learning and Artificial Intelligence started by Boeing at the Center for Aerospace Research

(CfAR) requires from definition of the SAA system and a rigorous analysis of it before the system can be

developed and eventually certified for operational use.

This paper will be focused on evaluating the capabilities of the non-cooperative sensors for a SAA

system, reviewing the different sensors available and the data fusion techniques to merge the information

provided by the different sources. Finally, algorithms for visual cameras image processing using machine

and deep learning techniques will be developed and compared, with the aim to provide an effective

obstacle detection capability.

Keywords: Sense and Avoid ; Non-cooperative sensors; Data Fusion; Machine Learning; Deep

Learning

iii

Resumo

Nos ultimos anos tem-se verificado um aumento significativo no uso de Unmanned Aircraft Vehicles

(UAV) para fins civis e militares. Combate de incendios, procura e salvamento ou entrega de enco-

mendas sao alguns exemplos. Devido a capacidade de realizar uma grande variedade de tarefas de

forma agil, com uma autonomia consideravel e com baixo risco para o ser humano, pequenos UAV, em

particular, sao uma escolha confiavel.

Contudo, para realizar tais tarefas, e necessario garantir uma segura e correta integracao num

espaco aereo que nao se encontra segregado. Introduzir unmanned aircraft em espaco aereo civil

requere o desenvolvimento e a certificacao de sistemas para sensing and avoiding (SAA) de outras ae-

ronaves. Em particular, um dos maiores desafios a ser abordado e non-cooperative Collision Detection

and Resolution (CD&R) para UAV.

O novo projeto Enhanced Guidance, Navigation and Control for Autonomous Air Systems based on

Deep Learning and Artificial Intelligence iniciado pela Boeing em parceria com o Center for Aerospace

Research (CfAR) exige uma definicao do sistema SAA e uma rigorosa analise do mesmo antes de

poder ser desenvolvido e eventualmente certificado para ser utilizado.

A presente tese focar-se-a em a avaliar as capacidades de sensores nao cooperativos para o sis-

tema de SAA, descrevendo os diferentes sensores previamente disponıveis e as tecnicas de fusao

de dados usadas para combinar a informacao proveniente das diferentes fontes. Por fim, serao apre-

sentados e comparados diferentes algoritmos para processamento de imagem provenientes de visual

cameras, que foram desenvolvidos recorrendo a tecnicas de machine e deep learning, sendo o principal

objetivo de ser desenvolvida a capacidade de detetar obstaculos de forma eficaz.

Palavras-chave: Sense and Avoid ; Sensores nao cooperativos; Fusao de dados; Deep Learning ;

Machine Learning

v

Contents

List of Tables xi

List of Figures xiii

Glossary xvii

1 Introduction 1

1.1 Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Unmanned Aircraft Systems and Sense and Avoid . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Sense and Avoid System Definition 7

2.1 Sense And Avoid Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.2 Cooperative sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 Non-cooperative sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Detecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Avoiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.6 System Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Non-cooperative Sensors 21

3.1 Why non-cooperative sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Radars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

vii

3.3 LIDARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Acoustic sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 EO cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.6 Sensor Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Data Fusion 33

4.1 Why data fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Data Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.1 K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.2 Probabilistic Data Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.3 Joint Probabilistic Data Association . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.4 Distributed Joint Probabilistic Data Association . . . . . . . . . . . . . . . . . . . . 37

4.3.5 Multiple Hypothesis Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.6 Distributed Multiple Hypothesis Test . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.7 Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.1 Maximum Likelihood and Maximum Posterior . . . . . . . . . . . . . . . . . . . . . 40

4.4.2 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.3 Distributed Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.4 Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.5 Distributed Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.6 Covariance Consistency Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.5 Fusion Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.5.1 Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.5.2 Dempster-Shafer Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5.3 Abductive Reasoning and Fuzzy Logics . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5.4 Semantic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.6 Other solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Machine and Deep Learning for obstacle detection using visual electro-optical sensors 49

5.1 Artificial Intelligence, Machine Learning and Deep Learning . . . . . . . . . . . . . . . . . 49

viii

5.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2.1 Regions with CNN features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.2 Fast R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.3 Faster R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Implementation of Faster R-CNN for aircraft detection . . . . . . . . . . . . . . . . . . . . 55

5.5 Implementation of SVM and edge detection for obstacle detection above the horizon . . . 60

5.5.1 Sky segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.5.2 Obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6 Simulation environment 67

6.1 ROS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1.1 ROS Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.1.2 ROS Packages, ROS Topics and rosrun . . . . . . . . . . . . . . . . . . . . . . . 69

6.2 Gazebo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.3.1 World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.3.2 UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.3.3 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7 Conclusion 75

7.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Bibliography 77

A Sensor models 83

A.1 Radar models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A.2 LIDAR models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.3 Camera models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

ix

List of Tables

2.1 Range and FOR requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Single-sensor vs. Multi-sensor SAA architectures . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Sensor Comparison. ++: Ideally suited. +: Good performance. 0: Possible but drawbacks

to expect. -: Only possible with large additional effort. –: Impossible. n.a.: Not applicable 31

5.1 Machine learning vs. deep learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2 Comparison of both implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.1 Camera characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

xi

List of Figures

1.1 Airspace Classes [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Basic Taxonomy of SAA systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Different scenarios depending on the location of the SAA components . . . . . . . . . . . 9

2.3 Cylindrical collision volume and collision avoidance/safe separation thresholds [7] . . . . . 12

2.4 Architecture of a tracking system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Sensor fusion configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 System Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Microphone polar plots [10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Process for object detection in EO cameras . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Example of basic edge detection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1 Data fusion schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 Example fuzzy membership function [47] . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 Schema of a camara-based enhancement of the radar measurement . . . . . . . . . . . . 47

5.1 Artificial intelligence, machine learning and deep learning [49] . . . . . . . . . . . . . . . . 50

5.2 Convolutional Neural Networks [51] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3 Regions with CNN features [52] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.4 Fast R-CNN [53] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.5 Comparison of R-CNN and fast R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.6 Faster R-CNN and RPN [54] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.7 SVM classification (adapted from [55]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.8 Relevant elements and selected elements . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.8 Accuracies and prediction times for different networks (I) . . . . . . . . . . . . . . . . . . . 58

xiii

5.9 Accuracies and prediction times for different networks (II) . . . . . . . . . . . . . . . . . . 59

5.10 Sky and ground features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.11 Results and detection time for different images . . . . . . . . . . . . . . . . . . . . . . . . 63

5.12 Results and detection time for close obstacles . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.13 Results and detection time for far away obstacles . . . . . . . . . . . . . . . . . . . . . . . 65

5.14 Other obstacle detection with the SVM + edge implementation . . . . . . . . . . . . . . . 66

6.1 Example of a ROS graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2 Prototyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.3 Quadrotor carrying an optical camera and a LIDAR and their information . . . . . . . . . . 71

6.4 Simulation environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.5 Simulated UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.6 Algorithms run over images collected from the simulation environment . . . . . . . . . . . 73

A.1 True View Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

A.2 Demorad Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A.3 Chemring Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A.4 360o Sense and Avoid Radar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

A.5 M8 LIDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.6 Ultra Puck LIDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

A.7 Blackfly S USB3 camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.8 Oryx 10GigE camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

xiv

Glossary

ADS-B Automatic Dependent Surveillance Broadcast.

AI Artificial Intelligence.

ATC Air Traffic Control.

CCD Charge Couple Device.

CfAR Centre for Aerospace Research.

CMOS Complementary Metal-oxide-semiconductor.

CNN Convolutional Neural Network.

DoD US Department of Defense.

EASA European Aviation Safety Agency.

EKF Extended Kalman Filter.

EO Electro-optical.

FAA Federal Aviation Administration.

FOR Field of Regard.

FOV Field of View.

GCS Ground Control Station.

GPS Global Positioning System.

ICAO International Civil Aviation Organization.

IFOV Instantaneous Field of View.

IFR Instrumental Flight Rules.

IR Infrared.

xvii

JPDA Joint Probabilistic Data Association.

KF Kalman Filter.

LIDAR Light Detection and Ranging.

MAP Maximum Posterior.

MHT Multiple Hypothesis Test.

ML Maximum Likelihood.

PDA Probabilistic Data Association.

PF Particle Filter.

PRF Pulse Repetition Frequency.

R-CNN Regions with CNN features.

ROS Robot Operating System.

RPN Region Proposal Network.

SAA Sense and Avoid.

SNR Signal-to-noise Ratio.

SVM Support Vector Machine.

TCAS Traffic Alert and Collision Avoidance System.

UA Unmanned Aircraft.

UAS Unmanned Aircraft System.

UAV Unmanned Aerial Vehicle.

UKF Unscented Kalman Filter.

VFR Visual Flight Rules.

xviii

Chapter 1

Introduction

1.1 Context and Motivation

Unmanned Aircraft Systems (UASs) is currently a field that utilizes the leading technologies advances

and challenges some of the current regulatory and operational practices of major aviation authorities

around the world. For that reason, the interest in the UASs has grown over the last years and lots of

applications have appeared in the scene, not only military but also civil or commercial.

Most military UASs are used for surveillance, intelligence, reconnaissance and strikes. The trend in

this field is to replace dangerous manned missions for unmanned ones, covering a significant part of

warfare activity.

About the civil part, the development of these systems has lead to a world of new opportunities to

cover a huge market. The Federal Aviation Administration (FAA) estimated that there will be 30000

drones in national skies operating by 2020 [1]. Some of the commercial applications for UAS are cate-

gorized into five groups and described at Angelov [2] as follows:

• Earth science applications, including remote environmental research, atmospheric monitoring

and pollution assessment, weather forecast or geological surveys.

• Emergency applications, including firefighting, search and rescue, tsunami/flood watch, nuclear

radiation monitoring and catastrophe situation awareness or humanitarian aid delivery.

• Communications applications, like telecommunication relay services, cell phone transmissions

or broadband communications.

• Monitoring applications, including homeland security, crop and harvest monitoring, fire detection,

infrastructure monitoring (like pipelines, oil/gas lines, etc.) or terrain mapping.

• Commercial applications, in the strict sense of the word, including aerial photography, precision

agriculture chemical spraying or transportation of goods and post.

1

Such a big number of applications has to lead to a big number of companies and research teams

to focus on that matter, to focus on what makes UAS different than manned air vehicles and, to be

able to do those tasks, what will be required to allow them to operate and be integrated with other

users into the civil airspace. One of the most important requirements for enabling these operations is to

provide the UAS with a Sense and Avoid (SAA) capability that replaces the manned aircraft capability

to see and avoid. Because such systems, that have to deal not only with other aircraft but also with

the environment around them (terrain obstacles, buildings, trees, wires, birds, etc), are typically very

complex and a high level of safety must be maintained, rigorous analysis is required before they can

be certified for operational use and a great effort over the past decade has gone into the definition and

aviation requirements for SAA.

The Centre for Aerospace Research (CfAR) in collaboration with Boeing and the National Research

Council in Canada is starting a research in the SAA where the system definition, data fusion and artificial

intelligence are keywords and problems that need to be addressed.

1.2 Unmanned Aircraft Systems and Sense and Avoid

To introduce the reader to the subject, it is interesting to explore what has the manned aviation in com-

mon and what are the differences with an UAS. Some of the ideas of this section have been proposed

in Angelov [2] and Fasano et al. [3].

There are different ways to refer to a pilotless aircraft. Some of them are Unmanned Aerial Vehicle

(UAV) or popularly, drone. But the most reputable organizations like the International Civil Aviation

Organization (ICAO), EUROCONTROL, the European Aviation Safety Agency (EASA) or the FAA, as

well as the US Department of Defense (DoD) adopted UAS as the correct and most complete term. The

cause is that an UAS is more than just an aircraft. A common configuration not only include the flying air

vehicle with the systems onboard (i.e., the unmanned aircraft (UA)), but also the ground control station

(GCS), that a pilot could use to operate the aircraft, and the command link between both of them.

The fact that the human pilot is not in the aircraft introduces the first problem: they can’t see the

surrounding environment to understand what is happening around the aircraft. That is one of the reasons

to introduce a SAA system. If the UAS is composed of a GCS, then a link between the pilot in the GCS

and the aircraft itself is needed. To manage the command and control issues, the inclusion of onboard

system for autonomously controlling the operation has to ensure a predictable performance and safety

when the link fails. Therefore, it has to be a part of the SAA system. One might think that to address the

lack of seeing and avoiding, a first-person view with onboard cameras transmitting the ground station

would be enough, but data link limitations and a constrained field of view (FOV) join the limited ability to

discern a target in the video picture and make it a very poor solution. Thus, the SAA methods will need

to sense and avoid other aircraft and obstacles, either onboard the UA or with the ground station in the

loop.

2

To classify properly a SAA system, the differential fact that the UAS may also have different missions

must be taken into account. Just as manned aircraft, UAS can range from larger aircraft (Global Hawk,

Reaper) through midsize aircraft (ScanEagle) to small aircraft (like the ones affordable for common

people).

It will be considered as large UA the ones that fly among manned aircraft in controlled airspace. De-

fined by ICAO, countries must provide Air Traffic Control (ATC) services to aircraft flying in this airspace

and a certain self-separation must be ensured by the manned aircraft. These airspaces include en-route

high altitude airspace (Class A), terminal airspaces (Classes B, C and D) and lower altitude transition

airspace (Class E). It is represented in Figure 1.1.

Figure 1.1: Airspace Classes [4]

Small UA have also reached commercial activities and research endeavors, not only recreation and

entertainment, and in most of the cases, a certified aircraft and pilot is required to operate safely. The

tasks for these purposes are usually developed in uncontrolled airspace, including low altitude Class G

airspace.

It can be noted that there will be several differences in a SAA system and architecture depending

on the UA size since the environment around them (manned aircraft and other obstacles) and the be-

haviour of the possible intruders (follow flight rules or not) will be different. On the one hand, a large UA

operating in controlled airspace has to deal with manned aircraft as the main obstacle, needs to inter-

act at the levels of separation for those airspaces and has to operate under the appropriate flight rules

(instrumental flight rules (IFR) or visual flight rules (VFR)), therefore the avoidance maneuvers have to

be the ones expected (right-of-way rule for example). On the other hand, the operating environment for

small UA is much different. The mission of these type of UAS most of the times requires operating more

closely to obstacles than current aviation separation standards permit. To introduce them to operate

safely and predictably in these rapidly evolving environments is then needed new separation definitions

and UAS airworthiness requirements.

3

1.3 Objectives

The final goal of the research developed by the different members of the team in the CfAR is to design

a full effective SAA system for autonomous UAVs, able to conduct all the operations onboard while

maintaining the demands of that kind of operations. Two constraints appear here and will be discussed

throughout this work: the algorithms have to consume few resources and computational time while

ensuring good results.

The project Enhanced Guidance, Navigation and Control for Autonomous Air Systems based on

Deep Learning and Artificial Intelligence was being started as this thesis was being written. Thus, as

in all new projects, one of the goals of this thesis is the definition of the problem. A proper architecture

has to be defined, the sensors to use have to be selected and the problematic of the multi-sensor data

fusion has to be described, attending to the characteristics of the system. Therefore, the first part of the

thesis will move in those directions, being the goal to have a better understanding of all these areas to

develop the future work in the directions that will be described.

The second part of the work will be focused on the development of an effective detection function

for the imagery provided by visual cameras. Machine and deep learning are keywords within the frame

of the project at CfAR, thus another goal of the thesis will be to test both techniques and address the

advantages and limitations of each one.

Eventually, all this work as well of the work of other members of the team will be merged and for that

reason it is needed to find a platform well suited to test different kinds of algorithms, not only to process

information of different sensors but also to test path planning algorithms that will use this information to

compute a path able to avoid the obstacles sensed. Then, the last goal of the thesis will be to address

this problem and to choose the proper platform to do all the operations.

1.4 Thesis Outline

The structure of the thesis is organized as follows:

• Chapter 2 introduces the basics of a SAA system and ends with a proposed architecture for the

work developed at CfAR.

• Chapter 3 describes non-cooperative sensors, addressing the advantages and limitations of each

ones depending on the mission in order to make a choice in which to use.

• Chapter 4 describes the different steps and data fusion techniques used to fuse the information of

different sensors and the characteristics of each one since they have to meet the requirements of

the system.

• Chapter 5 reports the implementation of machine learning and deep learning techniques can-

didates to be used to develop an obstacle detector for visual camera detection, comparing the

4

results.

• Chapter 6 introduces the reader to the simulation environment where all the work developed within

the frame of the project conducted at CfAR will be tested.

• Chapter 7 presents a conclusion for the work and provides suggestions for future work.

5

Chapter 2

Sense and Avoid System Definition

In this chapter, the basic concepts as well as the functionalities of SAA systems are discussed.

In Section 2.1, the steps and taxonomy of SAA systems are presented. In Section 2.2, different

SAA architectures are given. Sections 2.3, 2.4 and 2.5 present a brief and conceptual description

about sensing, detecting and avoiding in SAA respectively. Finally, an overview of the proposed system

definition and architecture is presented in Section 2.6.

Main sources of information (concepts and ideas) about SAA used to elaborate this chapter are

Angelov [2] and Fasano et al. [3].

2.1 Sense And Avoid Taxonomy

The core of SAA systems consists of mainly three steps:

1. Sense. Methods for surveilling the environment around the aircraft.

2. Detect. To analyze the information provided by the sensing methods to determine whether there

is any obstacle around the aircraft and it is a threat.

3. Avoid. To choose a suitable action for the aircraft to avoid the threat.

Sensors can be cooperative or non-cooperative. Detection can range from alerting the pilot to esti-

mating the motion and trajectory of the obstacle. The avoidance strategy can range from a basic action

(e.g., descend) to some complex actions that consider environmental factors. Both, basic and complex

actions, can be remotely operated or achieved by autonomous methods.

Different approaches (architectures and technical solutions) could be taken to accomplish these

tasks, and Figure 2.1 shows different approaches.

7

Figure 2.1: Basic Taxonomy of SAA systems

According to the agreements reached at the Sense and Avoid Workshops where US FAA and De-

fense Agency experts discussed a number of fundamental issues [5], an effective SAA system needs

to provide two common services. They are a self-separation service that would act before a collision

avoidance maneuver is needed, i.e., ensuring that the aircraft remain with a safe separation from each

other, and a collision avoidance service to protect a small collision zone and usually achieved by an

aggressive maneuver.

To achieve these services, the following list of sub-functions is required as described in Angelov [2]:

1. Detect any type of hazards, such as traffic, terrain or weather. At this step, it is merely an indication

that something is there.

2. Track the motion of the detected object. This requires gaining sufficient confidence that the detec-

tion is valid and making a determination of its position and trajectory.

3. Evaluate each tracked object, to decide if its track may be predicted with sufficient confidence and

to test the track against some criteria that would determine whether a SAA maneuver is needed

or not. The confidence test would consider the uncertainty of the position and trajectory. The

uncertainty could be great when a track has started, and again whenever a new maneuver has

first detected. A series of measurements may be required to narrow the uncertainty about the

new or changed trajectory. Also, when a turn is perceived, there is uncertainty about how great a

heading change will result in.

4. Prioritize the tracked objects based on their track parameters and the tests performed during

the evaluation step. In some implementations, this may help to deal with limited SAA system

capacity, while in others prioritization might be combined with the evaluation or declaration steps.

Prioritization can consider some criteria for the declaration decision that may vary with type of

hazard or the context of the encounter (e.g., within a controlled traffic pattern).

8

5. Declare that the paths of the own aircraft and the tracked object and the available avoidance

time have reached a decision point that does indeed require maneuvering to begin. Separate

declarations would be needed for self-separation and collision avoidance maneuvers.

6. Determine the specific maneuver, based on the particular geometry of the encounter, the maneu-

ver capabilities and preferences for the own aircraft, and all relevant constraints (e.g., airspace

rules or the other aircraft’s maneuver).

7. Command the own aircraft to perform the chosen maneuver. Depending upon the implementa-

tion of the SAA, this might require communicating the commanded maneuver to the aircraft, or if

the maneuver determination was performed onboard, merely internal communication among the

aircraft‘s sub-systems.

8. Execute the commanded maneuver.

2.2 Architectures

A preliminary classification of the architectures can be done depending on the location of the information

sources (e.g. sensors) and processing and decision making centers [2] [3].

(a) Onboard SAA (b) Ground-Based SAA (c) Hybrid SAA

Figure 2.2: Different scenarios depending on the location of the SAA components

As Figure 2.2 shows, these elements can be located onboard of the aircraft or on the ground. Be-

sides, a hybrid architecture can be used.

• Onboard SAA (2.2(a)): Sensing, processing information and its evaluation take place on the UA,

by an automated onboard processing capability. This type of system is the most challenging due

to size, weight and power of the needed devices to do all onboard. For instance, the sensors have

to be light enough to not to compromise the UA payload but at the same time, they have to be

able to meet the requirements for an effective SAA system. Then, also the onboard CPU has to be

enough powerful to process all the information and to provide a solution to avoid the obstacle. One

of the goal in the SAA field is to introduce Artificial Intelligence (AI) algorithms to compute 4D path

9

planning to make trajectories able to meet the avoidance requirements. Working on optimization

of this kind of algorithms is being done for the processors onboard the UA to be able to compute

it. All in all, this is the goal to be accomplished by future systems.

• Ground-Based SAA (2.2(b)): the three tasks are carried out in the ground. For example, a radar

senses the environment and sends this information to the GCS. After a decision is taken, it is sent

back to the UA to perform the proper maneuver. This type of configuration may be the simplest

one since it is compatible with all kinds of UASs and can work with a few or even no modifications

for different UAS designs. However, there is a serious limitation in terms of the area in which the

UA can fly, since the sensor is in the ground and it can only sense an area around it.

• Hybrid SAA (2.2(c)): While sensing takes place on the UA, detection and tracking of the target

may take place both on the aircraft or in the GCS. The information is evaluated in the ground and

a maneuver is selected and then transmitted to the UA. This can be a trade-off between the two

other configurations.

2.3 Sensing

A surveillance system can be implemented in different ways to accomplish their function of detecting

possible hazards. Depending on the choices made, the system will have different capabilities in terms of

coverage volume, types of measurements, accuracies, update rates and probabilities of false detection.

In this section, a background about sensing is given. First of all, it is advantageous to describe some

of these terms and sensor parameters that characterize a sensor.

• Range: distance measured from the sensor within which some good probability of detection of

targets may be expected.

• Field of View (FOV): solid angle through which a detector is sensitive to electromagnetic radiation.

• Field of Regard (FOR): total area that a sensing system can perceive by pointing the sensor. For

a stationary sensor, FOR and FOV coincide.

• Update rate: time between sensor measurements. A good sensor will detect the target at every

interval and its effective update rate will be high.

• Accuracy: uncertainty of the sensor measurements. It is often given in a single dimension, thus

the evaluation must combine accuracy values for different dimensions in order to create a volume

of uncertainty.

• Integrity: probability that a measurement has the accuracy specified.

It is worth to mention that there is a strong relationship between these terms. An integrated approach

when designing the sensing system is then necessary.

10

2.3.1 Requirements

A SAA system needs to fulfill some requirements in order to ensure the safe performance of the UA

in the airspace while providing obstacle detection and tracking. In Table 2.1, the proposed range and

FOR requirements for a SAA system are listed. These requirements, in the absence of specific technical

airworthiness requirements, are derived from the current regulations applicable to the manned aircraft

capability to see and avoid [6].

Detection range requirements [NM]

Manned UASAltitude

Nominal pilot Autonomous Line-of-sight Beyond line-of-sight

Low 2,6 1,1 1,8 1,9

Medium 4,2 1,8 2,9 3,1

High 5,7 2,8 4,1 4,3

FOR requirements [o]

Azimuth ± 110

Elevation ± 15

Table 2.1: Range and FOR requirements

The basic requirements for sensing range are set to a distance that must give enough time to perform

an avoidance maneuver which keeps a minimum separation between the UA and the intruder once it is

detected. In fact, detection by itself is not enough to evaluate a possible threat. The collision detection

and avoidance maneuvers can only be performed when a confirmed track is present. It occurs at the

range named declaration range. Distinction between detection and declaration is then important since

an early obstacle detection is needed to provide enough time for executing the trajectory avoidance

manoeuvres.

A 500 ft separation is considered as the minimum value to prevent a near mid-air collision. This

introduces the concept of a cylindrical collision volume, with a horizontal radius of 500 ft and a vertical

range of 200 ft, as it can be seen in Figure 2.3.

11

Figure 2.3: Cylindrical collision volume and collision avoidance/safe separation thresholds [7]

A head-on collision is then the worst case to evaluate the detection range.

The requirements for the FOV usually apply for non-cooperative architectures since the cooperative

sensors collect information from the whole space around the UA. The FOV has to be equivalent or

superior to that of a pilot in the cockpit in order to achieve the equivalent level of safety of a manned

aircraft. For a non-gimballed installation, FOV and FOR coincide and the angles needed were shown

previously in Table 2.1.

Other essential criteria for designing an effective SAA system is described in Ramasamy et al. [6] as

follows:

• Common FOV/FOR for visual and thermal cameras.

• Accurate and precise intruder detection (static and dynamic), recognition and trajectory prediction

(dynamic).

• Effective fusion schemes for multi-sensor data augmentation. especially by tight coupling.

• Identification of the primary means of cooperative and non-cooperative SAA system for integrity

requirements.

2.3.2 Cooperative sensors

Cooperative technologies are those that receive radio signals from another aircraft’s onboard equip-

ment. The cooperative sensors are the most accurate and they provide a satisfactory means of sensing

appropriately-equipped aircraft, but they are not able to detect intruder aircraft not equipped with the

proper avionics. Cooperative refers to the source of the sense information identifying the locations of

other aircraft and specifically refers to transmissions. This kind of sensors, as described in Howell et al.

12

[8], have better accuracy and integrity of aircraft position data. This improvement in accuracy and in-

tegrity leads to simpler conflict detection and collision avoidance algorithms, which in turn reduces the

complexity and cost for development and certification. Cooperative sensors also offer significant advan-

tages in reduced size, weight, and power requirements for the UA. A UAS can sense where the intruder

aircraft is and take decisions about how to avoid it. The main sensors used in this category are the Traf-

fic Alert and Collision Avoidance System (TCAS) and the Automatic Dependent Surveillance Broadcast

(ADS-B).

To respond to ground-based secondary radar interrogations, a large number of aircraft carry a

transponder. Indeed, in airspace classes A, B and C aircraft are required to be equipped with a transpon-

der. This technology has been exploited for the manned aircraft TCAS, but it operates independent of

the ground-based secondary radar and it provides position information of a potential conflicting aircraft.

However, a UAS has different operating characteristics and the TCAS would have to be significantly

adapted.

The ADS-B utilizes a navigation source like Global Positioning System (GPS) (thus, it depends on

its accuracy and reliability information) and broadcasts the position, velocity and other data without the

need of being interrogated. Other aircraft can use the received information and use it to calculate the

possibility of collision.

All in all, an UAS could use this type of sensors to keep a safe separation from cooperative aircraft

and use non-cooperative sensors for the rest of the aircraft and other obstacles.

2.3.3 Non-cooperative sensors

In airspace where non-cooperative traffic is allowed, other technologies are needed to detect traffic.

Non-cooperative sensors detect intruders or other obstacles in the UA FOR when cooperative systems

are unable to do that. The process is performed by scanning that region and determine if a measure of

the sensor, like the level of energy detected in a specific bandwidth, can be associated to an object that

represents a collision threat. It replicates the pilot‘s capabilities of using onboard resources and their

senses.

Some examples of non-cooperative sensors are electro-optical (EO), thermal, infrared systems (IR),

Light Detection and Ranging (LIDAR), radar and acoustic sensors. While optical and acoustic are best

for angular measurement, radar and laser are best for ranging. EO and IR systems are particularly

attractive for UA due to their power requirements and payload sizes, smaller than radar systems; and

LIDARs are used for detecting, warning and avoiding obstacles in low-level flight. Its angular resolution

and accuracy characteristics, as well as its detection performance in a wide range of incidence angles

and weather conditions, provide a good solution for obstacles detection and avoidance. Thus, multi-

sensor architectures could be better to develop a SAA system although their implementation could be

harder. The sensor fusion concept will be discussed in depth later in Chapter 4.

These sensors, in turn, can be classified into two groups: active or passive. Passive sensors use

13

the energy from other sources, like heat emission or sunlight, to make the measurement, while active

sensors provide their source of energy. EO cameras that exploit sunlight or thermal emissions in the

thermal IR bandwidth (coming from the aircraft engine for example) and acoustic sensors are passive

sensors. Radars and LIDARs that creates their own pulse are active sensors. For the passive sensors,

the signal travels only a single-way instead of two-way, as in the case of the active sensor,s and no

intermediate reflection is needed, which has repercussions on the path loss. However, active sensors

use to be more accurate and reliable, although their demanding of resources, such as power, space and

weight, is bigger.

More information about non-cooperative sensors will be given in Chapter 3.

2.4 Detecting

Once an object is sensed, a function is needed to detect whether it represents a possible threat or

not. It is divided into two sub-functions: tracking and conflict detection. For static obstacles such as

ground obstacle, the process is easier. A conflict happens when the distance between the UA and the

obstacle breaks the minimal defined separation criterion. In the first step, the system needs to associate

successive measurements with a specific target. If the target is dynamic, such as intruder aircraft,

the system needs to track it over time. A measurement will be associated with a track if its position

agrees with the expected position, within some margin of error that must take into account not only the

estimation uncertainties of the sensing systems (noise) but also feasible and unexpected maneuvers

that the target could perform. To calculate that position, it uses the previous position plus the estimated

velocity times the update interval, usually varying between 1 to 5 seconds. It means that a velocity vector

should be developed. In the case of some updates are missed, the tracking function should be capable

of maintaining the track for certain time projecting it ahead to an expected position, otherwise, the track

would not be correct. Nevertheless, the uncertainty grows in this case.

The general tracking architecture shown in Figure 2.4 can be applied to different SAA systems and

operating scenarios [3].

Figure 2.4: Architecture of a tracking system

14

If the tracking system is combining the measurements from different technologies, sensor fusion

becomes an important concept. In fact, there are several advantages in using a multi-sensor architecture

in terms of the reliability on the systems. They are listed in Table 2.2.

Architecture Advantages Issues

Single-sensor

Simpler to implement

Reduced impact of misalignment

for non-cooperative sensors

Harder design trade-offs to fulfill

sensing requirements with only one sensor

Multi-sensor

Improved performance (accuracy,

integrity, robustness)

Increased sensing range

Reduction of computational weight

Potential reduction of false detection

Relaxed sensing requirements for

single sensor in new designs

Complexity of implementation

Additional risks of duplicated tracks

Impact of residual misalignment

for non-cooperative sensors

Table 2.2: Single-sensor vs. Multi-sensor SAA architectures

The combination of multiple sensors rises a question on how to combine and process data from all

the sensors, taking advantage of their combinations. Data fusion is the answer to this question. It has

been studied for many years, especially for robotics applications. Data fusion allows taking advantage

of the combination of sensors, increasing the overall sensing performance.

The data fusion might be performed at different stages of the processing. It might be performed

on the raw sensor data (Figure 2.5(c)), or after tracking is made(Figure 2.5(b)). When data fusion is

performed on the raw sensor data, the detection and tracking stages are performed on the combined

data from the sensors. If the data fusion is made after tracking then the process is somehow different,

in this case, data from a single sensor is used to identify and track an intruder, after this, the data

fusion algorithms combine the tracking files generated from each of the sensors to provide more reliable

tracking information.

15

(a) Single-sensor configuration

(b) Multi-sensor configuration with sensor-level tracking and high-level data fusion

(c) Multi-sensor configuration with centralized tracking and data fusion

Figure 2.5: Sensor fusion configurations

Other literature such as Castanedo [9] gives further parameters to classify data fusion algorithms.

The relation between the sensors (input data) used might be complementary (data provided by one

sensor is not provided by the other), redundant (same data from different sensors) and cooperative

(sensors with different characteristics analyzing the same data). The data fusion process is performed

accordingly to some architecture that might be classified as centralized, decentralized or distributed.

In [9], the algorithms are presented in three different categories: data association, state estimation

and decision fusion.

• Data association. To determine from a set of measurements which ones correspond to each

target.

• State Estimation. From a set of redundant observations for one target, the goal is to find the set

of parameters that provides the best fit to the observed data.

• Decision Fusion. These techniques aim to make a high-level inference about the events and

activities that are produced from the detected targets.

16

Another important aspect concerning data fusion is the computational effort needed to accomplish it.

There has to be a trade-off between algorithmic performance and computational overhead. Even though

there has been a great improvement in the computational power on onboard devices this remains a

major concern, specially given what is expected for the Enhanced Guidance, Navigation and Control for

Autonomous Air Systems based on Deep Learning and Artificial Intelligence project: a fully air-based

SAA system, meaning that not-only the data fusion will be processed but all the other stages for SAA

operations will also be supported by onboard computers.

Once the sensor fusion is performed and the system has a valid tracked target, a conflict detection

function is needed to distinguish threatening from non-threatening traffic or other hazards. Conflict

detection logic is based on τ which approximates the time to the closest point of approach as follows,

τ = −rr

(2.1)

where r and r are the relative position and speed between the UA and the obstacle respectively. Note

that r will be negative when the UA and the threatening obstacle converge. The time is typically com-

pared with a maximum look-ahead time so that intruders with a large time to the closest point of approach

can be discarded and considered as non-immediate threats. When the trajectory is not leading to a col-

lision, the system has to check that a certain volume of airspace is not penetrated. Uncertainty and

unforeseen maneuvers should have taken into account to add some margin to establish that volume.

2.5 Avoiding

After sensing the environment and detecting a collision risk, the UAS must determine an effective avoid-

ance maneuver, it must plan a new path while avoiding the detected obstacles. This maneuver must take

into account aircraft capabilities in accelerating laterally, vertically or changing speed, the ultimate climb

or descent rates or bank angle to be achieved; constrains deriving from airspace rules, compatibility

with avoidance maneuvers to be performed by an intruder aircraft (expected to follow right-of-way rules),

other detected hazard (terrain or other intruder aircraft), errors such as the ones in position measure-

ment (the maneuver would seek the sum of the desired separation plus an error margin) and latencies

involved in deciding, communicating and executing the maneuver.

For an autonomous SAA, before selecting a maneuver to be performed by the UA, an algorithm has

to evaluate different alternatives on several dimensions of environment and safety. Factors for deciding

on an avoidance maneuver must be built into the avoiding and path planning algorithm. Researchers

have explored possible decision-making strategies that include when to start the maneuver (while the

UA is well clear or squeeze out of the time) and the deviation that should be taken (minimum to optimize

flight or more to provide a safety margin). It also needs to use the information available to provide with an

avoidance solution that fits the time frame given. Note the differences between an avoidance maneuver

if the target is suddenly sensed in an imminent collision or if it is sensed well in advance. Due to that,

the key is to design a range of different avoidance maneuvers to ensure safety and to be able to face

17

different scenarios.

2.6 System Definition

After all the review presented in this chapter, the proposed system architecture for the Enhanced Guid-

ance, Navigation and Control for Autonomous Air Systems based on Deep Learning and Artificial In-

telligence project developed at CfAR is represented in Figure 2.6. The architecture is designed for a

quadrotor UAV.

Figure 2.6: System Definition

The process starts with an environment scanning. The scanning will be conducted by a primary set

of non-cooperative sensors and then supplementary environment information can also be acquired by

a supplementary set of non-cooperative sensors. This guarantees a fast scanning with the primary

set of sensors. The supplementary ones add some redundancy and robustness to the system, especially

in situations where the primary set of sensors might not be able to operate. Chapter 3 will discuss the

characteristics of different sensors.

The data collected from the sensors is then analyzed and merged in the Sensor Data Management

and Data Fusion subsystem. In this way, the information provided by each sensor can be combined,

increasing the fidelity and robustness of the SAA system. This block is crucial to overcome the limitations

of each non-cooperative sensor by using fusion techniques that will be described in Chapter 4.

The data is sent to the Obstacle Detection and Tracking subsystem where different tasks are

conducted. The tasks include the mapping of each obstacle and prioritizing the ones to avoid. After

tracking an intruder, some estimation of its trajectory is done and a list of priorities is set. Based on

this list, avoidance actions are taken and sent to the Path Planning framework. Here, a trajectory is

18

generated and optimized. This trajectory is fed with information from the Navigation Sensors, which are

composed of a GPS to indicate the position of the UAV, an Inertial Measurement Unit (IMU) to determine

the attitude and acceleration of the aircraft, an altimeter to calculate the height and a magnetometer to

determine the direction of the UAV using the Earth’s magnetic field as a reference.

The trajectory and attitude information is transmitted to the Flight Management System, which, with

complementary information (e.g. battery state, sensor health state), sends a detailed flight plan to the

Autopilot, which will control the actuators, and thus the UAV kinematics and dynamics, with the aid of

the Flight Control System.

An initial mapping of the environment is an advantageous technique. A preliminary estimation of

static objects can be conducted with a LIDAR and a Simultaneous Localization and Mapping (SLAM)

algorithm. This map is then periodically updated when new information is received. The actions refer-

enced above are conducted in a loop to allow quick changes in the trajectory and emergency avoidance

maneuvers. Concurrently, information about the UAV status is being transmitted, using telemetry, to a

ground station. This station is then capable of overriding the autopilot commands in case, for example,

the team wants to abort the mission and land safely.

19

Chapter 3

Non-cooperative Sensors

In this chapter, the different options for non-cooperative sensors are described.

After justifying why these kind of sensors are chosen in Section 3.1, an overview of them is given.

Section 3.2 is focused on radars, Section 3.3 on LIDARs, Section 3.4 on acoustic sensors and Section

3.5 on EO cameras. Finally, Section 3.6 provides an overall view through a comparison of the main

sensors described.

The main source of the concepts and ideas for the development of this chapter is Fasano et al. [3].

3.1 Why non-cooperative sensors

As it has been explained in Sections 2.3.2 and 2.3.3, there are two main types of sensors, cooperative

and non-cooperative. Any application for SAA needs non-cooperative sensors since the intruder could

not be a well-equipped aircraft and, in that case, a cooperative sensor will not detect it.

The application studied in this paper would be initially performed by a small UAV in lower altitudes

where non-cooperative traffic is allowed and where ground obstacles could also be important to take

into account and, for that reason, to introduce non-cooperative sensors in the architecture takes more

importance than ever.

But cooperative sensors could be implemented in these circumstances as an alternative source

of information to significantly increase the accuracy of identifying cooperative aircraft. However, from

simulation and experimental point of view, the implementation of these kinds of sensors in the SAA

architecture would increase the complexity of the problem since testing them would require more than

an aircraft flying at the same time and equipped with the same type of cooperative sensor.

The implementation of an effective SAA architecture with only non-cooperative sensors is one of the

main topics in today‘s research in the field of UASs and SAA, and for this reason and the ones mentioned

above, this paper is also focused on it.

21

3.2 Radars

One of the solutions for sensing the environment is the radar. A simpler emitter can be a dipole con-

nected to an oscillator with a very good phase and frequency stability, and each time the electrons in

this dipole change the sign of their linear momentum, an electromagnetic wave is emitted in the form of

a photon.

A typical configuration of radar consists in a monopulse radar, a radar with a single antenna that emits

electromagnetic waves modulated by pulses at a certain frequency; or a continuous wave configuration,

with two antennas, transmitting and receiving one. A monopulse radar switches the antenna from the

transmitter to the receiver after emitting the pulse, employing the diplexer. When the initial pulse hit

an object, depending on the reflective properties of its surface, a wave is backscattered to the radar

antenna. If it has enough energy, the presence of the object is detected by the radar. The relative range

is measured through the travel time (roundtrip) of the wave. On the other side, for a continuous wave

radar, Doppler processing carries with the process of the measurement of the range rate. But for a

SAA system this type of configuration, that requires two antennas, is much more demanding in terms of

size and weight and for that reason, it is usually discarded since the SAA application require compact

configurations to be installed onboard the UA. Therefore, to measure the range it is used Equation 3.1,

R = ctrx − ttx

2(3.1)

where c is the speed of light, trx is the time when the pulse is received back in the antenna and ttx is

the time when the pulse is transmitted.

Some other parameters to take into account are described in the following lines. One of them is

the pulse repetition frequency (PRF). False objects can be detected if this parameter is not selected

properly. If a pulse is transmitted before the previous one is received, the value of trx will correspond

to the previous pulse and not to the actual one. For that reason, the PRF needs to be smaller than the

reciprocal of the time needed for a pulse to travel a distance that is the double of the maximum range

that is assigned to the region that forms the radar detection coverage area.

PRF ≤ c

2Rmax(3.2)

Common values for PRF range from 3 kHz for Rmax = 50 km to 30 kHz for Rmax = 5 km.

The detection of an object is possible if the power of the pulse received by the antenna is considerably

greater than the noise associated with the receiver. This is measured through the signal-to-noise ratio

(SNR) and the concept described can be expressed with Equation 3.3,

S

N≥ εth (3.3)

where εth is a proper threshold. False alarms (noise misclassified as an object) and missed detections

(object misclassified as noise) are a function of this threshold and it is usually mapped by the manufac-

22

turers. The threshold can be adapted according to the application, for example, by reducing it when the

distance from the ground is high enough. Usually is preferred a certain amount of false alarms that the

same amount of missed detections, since the first ones can be more easily removed. It can be done

even with a single scan exploiting an altitude threshold (removing all detected objects bellow certain

altitude known as ground clutter), in which case the radar processing unit needs to know aircraft altitude

and attitude; and a velocity threshold (removing stationary objects that are likely to be on the ground

using Doppler signature), a more critical process due to the danger of deleting also balloons or hover

helicopters. Values of εth are expressed in the decibel scale. The minimum level is 6 dB that corresponds

to εth = 3, 98. Higher the value, higher the quality of detection.

The SNR can be computed combining the next expressions. On one hand,

S =P tx G

2 λ2c σ

(4π)3R4 L(3.4)

where

• Ptx is the power emitted by the transmitter.

• G is the gain of the antenna, for a parabolic one:

G = (π d

λc)2ηA (3.5)

where d is the diameter of the dish and ηA is the aperture efficiency, dependent on the quality of

the antenna manufacturing (typical values between 0,55 and 0,70).

• λc is the wavelength associated with the carrier.

• σ is called the radar cross-section and expresses the reflectivity of the object and depends on the

type of the object, the material, its relative pose with the own aircraft and the radar λc. The typical

limit for detection applied for SAA is σ = 1 m2.

• R is the range.

• L is the transmission loss, mainly caused by atmospheric scattering and absorption. These type

of effects have a higher impact on high frequencies.

On the other hand,

N = k T B (3.6)

where k is the Boltzmann constant (1,38 10-23 J/K), T is the temperature (typically assumed T = 290 K)

and B is the radar bandwidth, measured in Hz.

Combining equations 3.4 and 3.6, the SNR is expressed as follows:

S

N=

P tx G2 λ2

c σ

k T B (4π)3R4 L(3.7)

23

Equation 3.7 can be used to estimate the maximum distance where an object can be detected. In

the extreme case S/N = εth and σ is equal to the minimum radar cross-section to be detected. Then, by

the substitution of these terms in the SNR equation 3.7, that range can be obtained as follows:

Rdet = 4

√P tx G2 λ2

c σmin

k T B (4π)3 εth L(3.8)

Best performance is provided when a large wavelength and a large pulse width is assigned. A trade-

off has to be made for small UA since larger wavelengths need larger antennas. Note that it has a direct

influence in the radar carrier frequency fc. Larger aircraft with 1 m or more diameter antennas can use

X band radars (fc = 10 GHz) while small aircraft need to use Ka-band or W band radars (fc = 35 GHz or

94 GHz respectively) and they need signal amplification to have enough emitted power.

To cover the required FOR, the radar head can be rotated mechanically or electronically. Mechanical

scanning is done with a motorized gimbal structure that allows full rotational control over the radar head,

with what this entails, i.e. high complexity of the layout, expensive maintenance, slowness of rotations

and difficulties on the installation due to the moving parts and their size. Electronic scanning implies

coupling several dipoles on a single strip. With the proper phase difference assigned to each dipole,

the relevant figure of interference produces a controlled rotation of the main lobe. Then, beam steering

can be realized at high speed without the need for moving parts, but this technique is expensive and

requires state-of-the-art technologies. Also, it only can reach an angular span of 50o with a single

electronic antenna due to the deformation of the lobes that can increase the risk of receiving a signal

from side lobes, so the problem becomes more complicated for larger angular scan widths.

All in all, radars are a good solution for non-cooperative sensing and some of their advantages are:

• They can estimate all terms of the track state.

• The reliability of detection and initial detection distance can be regulated by selecting the level of

transmitter power.

• They can deal with bad weather conditions since the degradation is well modeled and it can be

well compensated for increasing the emitted power or with the proper carrier frequency.

Nevertheless, they are quite demanding in terms of onboard resources, cost, size, weight, required

electric power; thus could be not suitable for small UAV.

3.3 LIDARs

The LIDARs are sensors that follow the same principles that radars but, instead of using microwave

energy emitted by the antennas, they use the light emitted by a laser. A laser is a device that uses an

effect of quantum mechanics, induced or stimulated emission, to generate a coherent beam of light both

spatially and temporally. Spatial coherence corresponds to the ability of a beam to remain a small size

24

when transmitted through vacuum over long distances, which makes it also a collimated source, and

temporal coherence relates to the ability to concentrate the emission in a very narrow spectral range.

A typical laser consists of three basic operating elements. A resonant optical cavity, in which light can

circulate, usually consisting of a pair of mirrors. One has high reflectance (close to 100%) and the other,

known as a coupler, has a lower reflectance and allows the laser radiation out of the cavity. Inside

this resonant cavity, there is an active medium with optical gain, which can be solid, liquid or gaseous

(usually the gas will be in a partially ionized plasma state). This medium is in charge of amplifying the

light. It is where the excitation processes occur and it can be made of many different materials. It is the

one that determines to a greater extent the properties of laser light, wavelength, continuous or pulsed

emission, power, etc.. Typical gain mediums are He-Ne or Nd-Yag. To amplify light, this active medium

needs a certain amount of energy, commonly called pumping. The pumping is generally a beam of light

(optical pumping) or an electric current (electric pumping).

In most lasers, the laser starts with the stimulated emission that amplifies the randomly emitted

photons spontaneously present in the gain medium. The stimulated emission produces a light that

equals the input signal in wavelength, phase and polarization. This, combined with the filtering effect of

the optical resonator, gives the laser light its characteristic coherence and can give it uniform polarization

and monochromaticity, i.e., all photons emitted having almost the same wavelength or frequency. After

being emitted from the source, the light hits a mechanism that supports the scanning of the FOR. This

mechanism can be a movable mirror or prism that allows controlling the orientation of the beam boresight

axis (axis of maximum gain). An important issue to take into account is that lasers for LIDARs must be

eye-safe since the beam could enter through the cockpit of a manned aircraft and hits the pilot‘s eye.

1000 nm lasers are preferred because their frequency is not harmful to the human eye.

Once the system has been described, it is interesting now to introduce the term of the resolution, as

well as interesting for the radars and other sensors. The resolution of a sensor is the smallest change

it can detect in the quantity that it is measuring, i.e., the minimum relative distance that permits it to

discriminate between two objects that are in the beam. Resolution can be defined for all terms included

in the track state: range, range rate, angles and angular rates, for example. Thus, all objects identified

closer than the resolution are considered as one. Then, the domains of the track state can be divided

into equal subdomains called cells, equal to or larger than the resolution. These cells will contain no

more than a single object. The angular resolution, that describes the ability of the sensor to distinguish

small details of an object, is given by Equation 3.9,

∆θ = 1, 22λcd

(3.9)

where ∆θ is the angular resolution (radians), λc is the wavelength of light, and d is a characteristic

size. The factor 1,22 is derived from a calculation of the position of the first dark circular ring surrounding

the central Airy disc of the diffraction pattern. For typical applications, Nd-Yag lasers with a peak wave-

length of 1064 nm (infrared) are adopted. With d = 5 10−3 m, equation 3.9 becomes in ∆θ = 1, 5 10−2

o, which means that the footprint of the beamwidth covers a circular area with a diameter less than 5 m

25

at a distance of 10 km from the emitter. Then, a huge number of cells required would lead to a very slow

scanning process. Some strategies, such as not to transmit pulses for all cells, can be adopted, but it

would determine a risk of detecting the object with a considerable delay with respect to the theoretical

initial distance that it could be detected. Another problem has been identified: due to the small wave-

length, it is very susceptible to atmospheric attenuation effects, that can reduce the maximum detection

range to less than 1,5 NM. Due to these constraints, LIDARs as a primary sensor can only provide a

feasible solution for SAA in short range operations, such as the ones executed by small fixed wing UAV

or multirotor, carried out in airspaces where the velocity is very limited. Other solution would be to use

them as secondary sensors in conjunction with EO cameras in order to perform the determination of

range and range rate for close objects.

All in all, LIDARs are compact enough to be suitable for small UAV operations but only if they fly in

airspace classes such as Class G (see Figure 1.1).

3.4 Acoustic sensors

Acoustic sensors are usually microphones, that measure the level of acoustic pressure emitted by an

object, such as the engine of an intruder aircraft. The measurement is done by sensing the pressure

difference over the two side faces of a thin diaphragm.

Microphones can be classified according to their polar patterns. On the one hand, an omnidirectional

microphone is usually a pressure-operated microphone, i.e., the diaphragm, which picks up sound vibra-

tions in the air, is completely open and exposed to open atmosphere at one side, but completely closed

at the other side, that contains a closed volume with a reference constant pressure. The sound vibration

is either pushing the diaphragm against the fixed pressure of the air on the other side, or it is reducing

the pressure on the front of the diaphragm allowing the pressure behind to push it out. Omnidirectional

means that it is sensitive to pressures incoming from any direction. On the other hand, a bidirectional

microphone is usually a pressure-gradient microphone, i.e., it has both sides of the diaphragm fully open

to the atmosphere, then, it compares the pressure of the sound wave on one side with the pressure of

that same sound wave after it has traveled through to the other side. The difference in pressure between

the front and the back of the diaphragm depends on the angle of incidence of the sound wave. If the

sound arrives from the side of the diaphragm, the pressure will be the same at the front and the rear,

therefore the diaphragm will not move and there will be no output. If the sound arrives from a direction

normal to the plane of the diaphragm, it will reach its maximum displacement. Bidirectional means that

it is sensitive to pressures incoming from a direction determined by the main lobes in the polar plot.

The polar plot determines the gain of the microphone in a certain direction.

26

Figure 3.1: Microphone polar plots [10]

Figure 3.1 shows, as said before, that an omnidirectional microphone has the same gain in all direc-

tions whereas the bidirectional has two main lobes with a phase shift of 90o, therefore it is sensitive to

sounds incoming from azimuths 0o and 180o and it is deaf to sounds from azimuths 90o and 270o. Both

types of microphones can be combined to form other polar plots such as cardioid, hypercardioid or shot-

gun. Cardioid is a form of microphone with a poor directivity but a great angular dispersion of the main

lobe. It has the maximum gain at 0o and a null at 180o. Hypercardioid and shotgun are configurations

where the proportion lies in favour of the bidirectional microphone, in order to increase the directivity.

To determine the bearing of the intruder, a SAA system would need a directional microphone. Also, a

proper installation strategy must be considered to reduce the impact of disturbance sources as the own

UA or other unwanted sources and to protect the microphones against dirt and other things that could

degrade them.

To estimate the range within which the acoustic sensor can detect an intruder, Equation 3.10 is used

Rdet = 10Lw−11−DI

20 (3.10)

where Lw is the sound power level (dB) and DI is the directivity index in the decibels domain given

by:

DI = 10 log10(Q) (3.11)

The directivity index Q is the equivalent to the gain G in the radar equations, then, for an isotropic

(omnidirectional) source Q = 1, since it measures the relation with respect to this kind of source, and

therefore DI=0. If a jet engine is considered a isotropic source with Lw = 90 dB, Rdet ≈ 9000 m. A further

reduction could be derived by applying parameters such as atmospheric absorption, a linear function of

R, and attenuation due to wind and turbulence, very complex to calculate that range from 0 dB for calm

weather to 15 dB, according to a rule of thumb.

There are also a lot of noise sources that are not a threat. In addition, they can disturb the transmis-

sion of the intruder‘s sound and they are very difficult to predict. Even the speed of sound is dependent

27

on the thermodynamic properties of the material that transmits the wave, which leads to echoes, dis-

tortions and nonlinearities, and, in turn, to false alarms or missed detections. Thus, the recognition of

the threat is performed by estimating the spectral signature of the sound received in order to perform

a correlation analysis with the spectral signature of an aircraft engine, but problems in the transmitted

sound can persist anyway.

For all these reasons, acoustic sensors are strongly dependent on the type of engine and environ-

mental conditions and they are not a good option as a primary source of reference for a SAA system.

However, they can be a low-cost option as an auxiliary source.

3.5 EO cameras

EO systems are systems that use a combination of electronics and optics to generate, detect, and/or

measure radiation in the optical spectrum, generally in the visible and IR wavelengths. An array of

detectors, called pixels, are capable of acquiring the visible intensity of light over a large solid angle,

which extent is the FOV, when placed in the focal plane of a suitable lens system. The angular dimension

of the FOV in a certain direction is given by the Equation 3.12

∆θ = 2 arctan(f

2d) (3.12)

where d is the size in that direction of the detector array and f is the focal length of the lens system. Each

pixel also has its own angular width called instantaneous field of view (IFOV), given by Equation 3.13

∆α =∆θ

n(3.13)

where n is the number of detectors distributed along that direction. As there are an array of detectors,

each sensor has a vertical and horizontal FOV and IFOV.

For the visible wavelength sensor arrays, there are two types of technologies available: charge cou-

ple devices (CCDs) and active pixel sensors with complementary metal-oxide-semiconductor (CMOS).

CCDs have a better SNR but are more difficult to integrate with current processing electronics. CMOS

technology is the opposite and, in addition, it allows local amplification of poorly lit areas because the

light intensity level can be read, but the sensor stills picking up charges. Another thing to take into ac-

count related with the SNR is that the Bayer filter reduces the SNR of a sensor, therefore, for normal

applications panchromatic cameras are usually preferred to color ones, which output information about

the color of a pixel by applying a suitable Bayer filter in front of the array.

In relation with IR sensors, they can be:

• Near IR sensors λ ∈ [750, 1400] nm, used to enhance vision in poorly lit conditions, benefiting from

the quantity of energy emitted by artificial illumination sources. Their detectors are CCDs without

coating filters so they collect light and noise over a wide frequency range.

28

• Thermal IR sensors medium λ ∈ [3000, 8000] nm and long λ ∈ [8000, 15000] nm wave IR. Essen-

tially, they detect the heat emission from the engines and the exhaust from the plume because, at

these wavelengths, there is a peak of the blackbody radiation curves. In particular, medium wave

IR allows bright images of hot objects with the adoption of compact cameras. However, forced

cooling is needed to keep the level of noise in a reasonable value.

Visible and near IR CCD and CMOS can have up to 5000 pixels per line, thus with a FOV of 50o,

using 3.13, it results to an angular pixel size up to 0,01o. For the medium wave IR cameras, that have

up to 640 pixels per line, it results in an angular pixel size up to 0,08o, almost one order of magnitude

worse. That and the worse noise level make medium wave IR cameras a not suitable option for SAA

even though they could support night-time operations.

Figure 3.2 shows the process for image acquisition and detection.

Figure 3.2: Process for object detection in EO cameras

Regardless that the readout (two-dimensional signal in the array into a one-dimensional stream) and

the analog to digital (A/D) conversion are performed by electronics embedded in the detector, the image

processing is different depending on the application. Even in the same image, it is different for SAA

applications due to the regions located above and below the line of horizon have different properties.

For instance, the distribution of intensities above the horizon tends to be much more uniform than the

ones of pixels below the horizon and for that reason, the SNR tends to be higher below the horizon to

detect the same object with similar visibility and illumination conditions. Some classical approaches for

the image processing part are:

• Basic edge detection. This method is intended to identify points in a digital image where the

brightness of the image changes abruptly or, more formally, has discontinuities. Points where the

brightness of the image changes sharply are typically organized into a set of curved line seg-

ments called edges. Since an object of interest tends to be brighter or darker than its background,

the edge detection process will likely highlight its border pixels. Nevertheless, it can generate a

high number of false alarms if a thresholding strategy is not adopted. A simple algorithm of this

technique, running in Matlab, and its result are shown in Figure 3.3

29

Figure 3.3: Example of basic edge detection algorithm

• Template matching. This technique is made through comparing the image acquired by the sensor

with a database of expected images of the object of interest. If there is a correlation, the object

can be detected. Thus, the objects must have a small variability in terms of shape for the good

performance of this method, because it tends to be degraded if the number of objects in the data

base is large.

• Morphological filtering. It applies specific morphological operators in an appropriate sequence

to highlight and separate candidate objects to be detected. The image is binarized by selecting

a proper thresholding strategy and the clusters that have the expected size and pixel distributions

are classified as objects.

About the performance of the EO cameras, Equation 3.14 for angular span γ is used to estimate the

elevation with respect to sensor axis and the azimuth,

γ = arctan(f n

d q) (3.14)

where q is the number of pixels between the object and the pixel aligned with the boresight axis and d/n

is the size of a single detector in the array. For the elevation, the direction of the line in the array used

is vertical and for the azimuth horizontal. In general, a detected object is extended over several pixels

but its center can be computed with subpixel accuracy by averaging the position of pixels weighted with

the measured intensities. Regarding the detection range, two parameters must be taken into account.

One is the radiometric detection, referred in Equation 3.3, that needs to be checked for a single pixel.

If the measured level is in the order of the level noise, no object can be detected. Nevertheless, the

level of sunlight reflected by most aircraft or the heat emitted by the engine of an aircraft is always much

greater than the noise of the sensor in practical application and, consequently, radiometric detection is

not the one that limits the detection range. The other parameter is the geometric detection, stating that

an object can be detected if it is distributed over an area of pixels larger than a minimum. A practical

approach requires that the object is extended over 4 or 5 pixels. Then, the detection distance can be

computed as follows:

Rdet =l f

q m(3.15)

where l is a characteristic size (such as wingspan), q the minimum number of pixels needed for the

detection and m the pixel size. Applying Equation 3.15, an aircraft with a wingspan of 9 m can be

30

detected by a 1 mega pixel EO camera with focal length of 2 mm and a pixel size of 5 µm at a distance

Rdet = 720 m, with a 5-pixel criterion.

In conclusion, EO cameras are a good primary source of information for small UAV flying into class

G airspace with the proper weather conditions and can be combined as a secondary source with other

sources such as radars or LIDARs to increase the overall angular resolution and the data rate.

3.6 Sensor Comparison

It can be concluded that the best sensors to be used in a SAA application are radar, LIDAR and EO

cameras. For that reason, an overview of their characteristics and limitations is summarized in Table

3.1.

Short Range

Radar

Long Range

RadarLIDAR

Video

Camera

3D

Camera

Far IR

Camera

Range Measurement

<2 m0 0 0 - ++ -

Range Measurement

2 - 30 m+ ++ ++ - 0 -

Range Measurement

30 - 150 mn.a. ++ + - - -

Angle Measurement

>10 deg+ + ++ ++ + ++

Angle Measurement

>30 deg- - ++ ++ + ++

Angular Resolution 0 0 ++ ++ + ++

Velocity Information ++ ++ – – – –

Operation in Rain ++ + 0 0 0 0

Operation in

Fog or Snow++ ++ - - - 0

Operation if

Dirt on sensor++ ++ 0 – – –

Night Vision n.a. n.a. n.a. - 0 ++

Table 3.1: Sensor Comparison. ++: Ideally suited. +: Good performance. 0: Possible but drawbacks to

expect. -: Only possible with large additional effort. –: Impossible. n.a.: Not applicable

So, in terms of implementation, the sensors have the following characteristics.

Camera

31

• Highest resolution

• Long range

• Good angular resolution

• Clear sky / day time solution

• No distance or speed detection

• Low cost solution

LIDAR

• High resolution

• Short range

• Good weather solution

• Angular and distance information

• Expensive equipment

Radar

• Low resolution or single point

• Long range

• All weather solution

• Distance and speed information (rough angular information)

• Expensive equipment

As an example, a configuration with these three sensors could provide information about obstacles

by the next means.

• Camera. Image (video frames) processed by an algorithm in order to detect obstacles in clear sky

providing relative elevation and azimuth.

• LIDAR. Point cloud data processed using an algorithm to detect objects (reflective clusters) pro-

viding the size and distance of the object as well as angular position. Two sets of data can be

compared to calculate the speed based on aircraft information.

• Radar. The detection of the object will provide distance and speed (Doppler processing needed)

as well as angular position in the case of array radar.

Some cameras, radar and LIDAR models available in the market that could suit the application are

shown in Appendix A.

32

Chapter 4

Data Fusion

The goal of this chapter is to present the different options (architectures and algorithms) for data fusion,

their advantages and their limitations.

In Section 4.1, the idea of the data fusion is introduced. Section 4.2 presents the main architectures

for a data fusion system. Sections 4.3, 4.4 and 4.5 describe the steps in a data fusion process (data

association, state estimation and fusion decision, respectively) and different algorithms used in each

one. Finally, in Section 4.6, a way to take advantage from a multi-sensor configuration is provided.

The main sources of the concepts and ideas for the development of this chapter are Castanedo [9]

and Fasano et al. [11].

4.1 Why data fusion

Data fusion is the process of combining information from several different sources to provide a robust

and complete description of an environment or process of interest.

Data fusion is especially important in any application where large amounts of data must be combined,

merged and distilled to obtain information of adequate quality and integrity on which decisions can

be made. But also in applications where automated data fusion processes allow the combination of

measurements and essential information to provide knowledge of sufficient richness and integrity to

formulate and execute decisions autonomously. Data fusion finds application in many military systems,

civil surveillance and monitoring, process control and information systems. Data fusion methods are

particularly important in the drive towards autonomous systems in all these applications.

Thus, data fusion becomes a key element when dealing with different sensors, different sources of

information with different characteristics. For a configuration with a radar, LIDAR and visual EO camera,

a schema such as the one in Figure 4.1 is proposed.

33

Figure 4.1: Data fusion schema

In order to prepare the raw data to feed into the data fusion processor, to adapt the sensor interface

to standard interface (IP, serial, etc.) and to prevent the processing power to be spent on data translation,

a block between each sensor and the data fusion itself is needed. Therefore, these blocks can perform

primary processing and reduce the complexity of the data fusion unit.

The data fusion will have to deal with delays and signal dropouts and the different accuracies and

sensor rates (for instance, typical airborne radar has azimuth and elevation accuracy of 0,2 to 2o and

range accuracy of 10 to 200 feet, operating at 0,2 to 5 Hz while EO sensors have azimuth and elevation

accuracy of 0,1 to 0,5o and it operates at 20 Hz [12]) Besides, they have to take into account the different

coordinate frames (including the GPS uncertainties when transforming the tracks from one frame to

another) and perform the track association.

Different algorithms to perform data fusion are going to be presented in the next sections, and each

one has different characteristics. For SAA applications, several things should be taken into account.

The first is the computational cost since the resources onboard the UAV are limited. Second, the non-

linear idiosyncrasy of the application. And third, the information stored in the object’s state, especially

important to build the decision model. Therefore, the decision in which algorithms to use must take

into account to fulfill these conditions. A suitable option could be to use K-means for data association,

extended Kalman filter for state estimation and fuzzy logic for the decision. Nevertheless, a review of

different algorithms will be done to give the reader a better understanding.

4.2 Classification

The multidisciplinary nature of the data fusion makes difficult to come up with clear and strict classifi-

cation. Depending on the application, the data fusion techniques can be divided (1) attending to the

relations between the input data sources [13], (2) according to the input/output data types and their na-

ture [14], (3) following an abstraction level of employed data [15], (4) based on the different data fusion

levels defined by the Joint Directors of Laboratories and (5) depending on the architecture type. Never-

theless, for a SAA application, the classification mentioned in the literature is the last one: depending on

the architecture type, that was previously introduced in Section 2.4. It concerns where the data fusion

34

process will be performed and, following this criterion, the classification is made as follows:

• Distributed architecture or sensor-level fusion. Measurements of each source node are pro-

cessed independently before the information is sent to the fusion node. Therefore, each node

provides an estimate of the object’s state based only on its local views, and this information is the

input to the fusion process, which provides a fused global view. In this configuration, the computa-

tional power has to be available at the sensor level. In addition, from the point of view of estimating

the kinematic state of the target, the sensor-level tracks, and not the measurements, are fused.

Track to track fusion is a complex task and requires a certain amount of computational expense

[16]. Sensor-level fusion reduces the exchange of data between nodes, paying the cost of a higher

computational load.

• Centralized architecture or central-level fusion. The fusion node resides in the central proces-

sor that receives information from all input sources. Therefore, all fusion processes run on a central

processor that uses the raw measurements provided from the sources, which are combined to ob-

tain a single set of tracks. If it is assumed that data alignment and data association are performed

correctly and that the time required to transfer the data is not significant, then the centralized

scheme is theoretically optimal, but that assumption is not true for real systems. Besides, the large

amount of bandwidth required to send raw data over the network is another disadvantage of the

centralized approach. This problem becomes a bottleneck when this type of architecture is used

to fuse data into visual sensor networks. Finally, delays in information transfer between different

sources are variable and affect to the centralized scheme results to a greater extent than in other

schemes. But central-level fusion major drawback is that if a sensor measurement is degraded, it

affects the entire estimation process. Globally, the computational load is lower than in sensor-level

architecture, however, generally more data have to be exchanged between network nodes.

These architecture were already represented in Figures 2.5(b) and 2.5(c) respectively.

One can also think about hybrid fusion configurations which try to combine the advantages of cen-

tralized and distributed architectures by supposing that both raw sensor data and sensor-level tracks can

be combined in the fusion processor, but the disadvantage is in terms of increased processing complex-

ity and possibly increased data transmission rates. In practice, there is no single optimal architecture,

and the selection of the most appropriate architecture should be made based on requirements, demand,

existing networks, data availability, node processing capabilities and the organization of the data fusion

system.

4.3 Data Association

When talking about data fusion, the following problems could arise in the reader’s mind:

• Each sensor’s observation is received in the fusion node at discrete time intervals.

35

• The sensor might not provide observations at a specific interval.

• Observations could be noise.

• The observation generated by a specific target in every time interval is not known (a priori).

The data association techniques aim to determine from a set of measurements which ones correspond to

each target. The data association is often carried out before the estimation of the state of the detected

targets and it is a key step because the estimation or classification will behave incorrectly if the data

association phase does not work consistently. However, this process could appear in all of the fusion

levels but, for a SAA system, is in the sensor processing where it takes special importance because,

roughly speaking, it will determine the number of obstacles that a sensor is sensing.

The following algorithms give a good performance for the data association task:

• K-means

• Probabilistic Data Association (PDA)

• Joint Probabilistic Data Association (JPDA)

• Multiple Hypothesis Test (MHT)

• Distributed Joint Probabilistic Data Association

• Distributed Multiple Hypothesis Test

• Graphical Models

4.3.1 K-Means

K-means is an algorithm that divides the values of the dataset into K different clusters, attending how

similar the values are. It is an iterative algorithm that, from the dataset and the number of desired

clusters K as an input, obtains the centroid of each cluster by

1. randomly assigning the centroid of each cluster,

2. matching each data point with the centroid of each cluster and

3. moving the cluster centers to the centroid of the cluster.

If the algorithm does not converge, it returns to the step (2). It is a well-known algorithm and is relatively

easy to implement. Also, it can find a feasible solution (not optimal) in a short time. Nevertheless, the

major drawback for a SAA application where the environment is constantly changing is that it needs to

know the number of clusters beforehand and this number must be the optimum to provide good results.

Some options to overcome it are to run the algorithm several times with different values of K until an

adequate result is obtained (a solution with less variance is also a way to try to find the most optimal

36

solution) or to obtain the number of clusters with another technique (e.g. nearest neighbours algorithm

in a non-cluttered environment).

4.3.2 Probabilistic Data Association

This algorithm was proposed in Bar-Shalom and Tse [17] to deal with a cluttered environment. It assigns

an association probability to each hypothesis from a valid measurement, i.e. an observation that falls in a

validation gate of a target in a given time instant. The state estimation of the final target is computed as a

weighted sum of the estimated state under all the hypothesis. However, it has three main disadvantages:

• Poor performance when the targets are close to each other or crossed.

• PDA can not deal properly with multiple targets because the false alarm model does not work well.

• It will lose a target if it makes abrupt changes in its movement patterns, which is the case of UAS

SAA applications.

4.3.3 Joint Probabilistic Data Association

This algorithm is a suboptimal version of the PDA able to deal with multiple targets. The difference lies

in that the association probabilities are computed using all the observations and targets, i.e. it considers

various hypothesis together and combines them. The main restrictions of JPDA are that a measurement

only can come from one target and at one time instant, two measurements can not be originated by the

same target. Note that the sum of the probabilities assigned to one target must be 1. Therefore, for a

known number of targets, it evaluates the different options of the measurement-target association (for

the most recent set of measurements) and combines them into the corresponding state estimation.

Regarding to the advantages and disadvantages, on one hand, it obtains better results than other

algorithms (such as the MHT, explained in the next section) in situations with a high density of false

measurements but, on the other hand, it is computationally expensive since the number of hypothesis

increases exponentially with the number of targets.

4.3.4 Distributed Joint Probabilistic Data Association

This distributed version of the JPDA algorithm was proposed by Chang et al. [18]. However, the equa-

tions used to estimate the state of the target using several sensors assume that communication exists

after every observation. An approximation can be made in case of sporadic communication and sub-

stantial amount of noise, but this algorithm remains as a theoretical model and needs to be developed

to overcome the limitations for practical applications.

37

4.3.5 Multiple Hypothesis Test

Using only two consecutive observations to make the association could lead to an error. Then, the idea

behind the MHT algorithm is using more observations in order to have a lower probability of getting an

error. To do that, MHT estimates all the possible hypotheses and maintains new ones in each iteration. It

was created to track multiple targets in cluttered environments, thus, it becomes an estimation technique

as well. To calculate the hypotheses, Bayes rule or Bayesian networks are used. The algorithm by Reid

[19] is considered the standard MHT.

Each iteration of this algorithm starts with a set of correspondence hypotheses. Each hypothesis

is a collection of disjointed tracks, and the prediction of the target at the next instant is calculated for

each hypothesis. The predictions are then compared with the new observations using a distance metric.

The set of associations established in each hypothesis (based on distance) introduces new hypotheses

in the next iteration, which represent new sets of targets based on current observations. Each new

measurement can come from a new target in the field of vision, a target being tracked, or noise in the

measurement process. It is also possible that a measurement is not assigned to a target because the

target disappears, or because it is not possible to obtain a measurement of the target at that time. MHT

also can detect a new track while maintaining the hypothesis tree structure. The probability of a true

track is given by the Bayes decision model, as said before. The algorithm considers all possibilities,

including both track maintenance and track initialization and removal in an integrated framework. MHT

computes the possibility of having an object after generating a set of measurements using a compre-

hensive approach, and the algorithm does not assume a fixed number of targets. The major challenge

is the effective hypothesis management.

MHT is better than other algorithms such as JPDA for the lower densities of false positives. Neverthe-

less, the computational cost of the MHT increases exponentially when the number of tracks, measure-

ments or false positive is incremented. For this reason, the practical implementation of this algorithm

is limited due to the cost is exponential in both time and memory. To reduce the computational cost,

several approaches have been presented. Streit and E. Luginbuhl [20] presented a probabilistic MHT

algorithm in which associations are considered random variables that are statistically independent and

in which an exhaustive search enumeration is avoided and, for that, it is assumed that the number of

targets and measurements is known. Cox and Hingorani [21] presented an efficient implementation of

MHT where the best set of k hypotheses is determined in polynomial time to track the points of interest.

It was the first version used to perform tracking in visual environments.

Other question is that MHT only employs one characteristic (typically position) to perform the track.

Liggins et al. [22] used a Bayesian combination to overcome that and to use multiple characteristics.

4.3.6 Distributed Multiple Hypothesis Test

MHT algorithm also has a distributed version, as JPDA had it, however, the computational cost of this

algorithm is even higher than the MHT, thus, its implementation in practical applications is even more

38

difficult.

4.3.7 Graphical Models

An explicit representation of the full joint is intractable since it is computationally expensive to manipulate

and the distribution is cognitively impossible to obtain by human experts and statistically unfeasible to

learn from data due to the enormous amount of parameters to estimate. A graph, where the nodes are

the variables, the edges the relationships between variables and the plates the replication of a substruc-

ture, with the appropriate indexing of the relevant variables, represents the joint probability distribution

among the variables, i.e. all their probabilistic dependencies. Dependencies reduce the parameters

to estimate. The main graphical models are Bayesian networks (directed graph) and Markov random

fields (undirected). Directed refers to the nodes can have parents or child, while in undirected graphs

the nodes are just neighbours. On one hand, directed graphs are useful to express casual relationships

between the variables. For instance, in the Bayesian networks each node X has a conditional probability

distribution P (X|PaX , where PaX) denotes the set of all parents of the node X, and the topology of the

network specifies local conditional independencies (X⊥NonDescendantsX |PaX) as well. On the other

hand, undirected graphs are better suited to express soft constraints between the variables.

The complexity of this kind of methods is reasonable and less than the complexity of the MHT family

algorithms, but special attention is needed for the correlated variables when building the graphical model.

4.4 State Estimation

From a set of redundant observations for one target, the goal is to find the set of parameters that provides

the best fit to the observed data. It is also task of these algorithms to determine the state of the target

(such as position) normally under movement for the SAA applications, given the different observations

of each sensor. Thus, these techniques can fall under tracking techniques. The state estimation phase

is a common stage in data fusion algorithms because the target’s observation can come from different

sensors, and the ultimate goal is to obtain a global target state from the observations. The estimation

problem involves finding vector state values (e.g., position, velocity, and size) that fit the observed data

as closely as possible. In general, these observations are corrupted by errors and noise propagation in

the measurement process, so it has to be taken into account in the algorithms that will perform this task.

State estimation methods could be divided into two groups:

• Linear dynamics, when the equations of the object state and the measurements are linear, the

noise follows a normal distribution and the environment is not cluttered. The estimation problem

has a standard solution and the optimal theoretical solution is based on the Kalman filter.

• Non-linear dynamics, when the estimation problem becomes difficult. There is not an analytical

solution to solve it in a general manner. It is the case of the UAS SAA problem and some variations

39

of the Kalman filter can deal with non-linearities such as the extended Kalman filter.

Most state estimation methods are based on control theory and use probability laws to calculate a

vector state from a vector measurement or a stream of vector measurements. The next methods are

used for state estimation:

• Maximum Likelihood (ML) and Maximum Posterior (MAP)

• Kalman Filter (KF), Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF)

• Particle Filter (PF)

• Distributed Kalman Filter

• Distributed Particle Filter

• Covariance Consistency Methods (Covariance Intersection and Covariance Union)

4.4.1 Maximum Likelihood and Maximum Posterior

ML is an estimation method based on probabilistic theory. Given x the state being estimated, a fixed but

unknown point from the parameter space, and z = (z(1), ..., z(k)) the sequence of k previous observa-

tions of x, λ(x) is the likelihood function defined as a probability density function of the sequence of z

observations given the true value of state x.

λ(x) = p(z|x) (4.1)

ML finds the value of x that maximizes the likelihood function.

x(k) = argmaxx

p(z|x) (4.2)

The hat on the top of a variable uses to mean that it is an estimation. This notation is widely used in this

kind of algorithms.

Probabilistic estimation methods are appropriate when the state variable follows an unknown prob-

ability distribution. But the main disadvantage of ML is that it requires to know the empirical model of

the sensor to provide the prior distribution and to compute the likelihood function. It also underestimates

the variance of the distribution systematically, which could lead to a bias problem, although the bias of

the ML solution becomes less significant as the number N of data points increases and it is equal to the

true variance of the distribution that generates the data at N −→∞.

MAP is used when x is the output of a random variable with a known probability density function p(x).

MAP finds the value of x that maximizes the posterior probability distribution.

x(k) = argmaxx

p(x|z) (4.3)

Both methods are equivalent when there is no a priori information on x, i.e. when there are only obser-

vations.

40

4.4.2 Kalman Filter

KF is the most well-known and popular estimation technique. Proposed by Kalman [23], it has a lot

of application in many different fields of engineering. If the system could be described as a linear

model and the error could be modeled as the Gaussian noise, then the recursive KF obtains optimal

statistical estimations [24]. But when dealing with an UAS SAA application the system is not linear and

other methods are required to address nonlinear dynamic models and nonlinear measurements. Here

it comes the EKF, which is an optimal approach for implementing nonlinear recursive filters [25] widely

used for fusing data in robotics, and the UKF.

The differences between each filter are shown in the next table of equations 4.4.

KF EKF UKF

F =∂f(xt,ut)

∂x

∣∣∣∣xt,ut

Y = f(χ)

x = Fx + Bu x = f(x,u) x =∑wmY

P = FPFT + Q P = FPFT + Q P =∑wc(Y − x)(Y − x)T + Q

Z = h(Y)

H =∂h(xt)

∂x

∣∣∣∣xt

µz =∑wmZ

y = z−Hx y = z− h(x) y = z− µz

S = HPHT

+ R S = HPHT

+ R Pz =∑wc(Z − µz)(Z − µz)T + R

K = PHTS−1 K = PH

TS−1 K =

[∑wc(Y − x)(Z − µz)T

]P−1

z

x = x + Ky x = x + Ky x = x + Ky

P = (I−KH)P P = (I−KH)P P = P−KPzKT

(4.4)

EKF is used in many literature such as Fasano et al. [26] to perform this step of the data fusion in

SAA applications (its differences with the linear KF have been highlighted in boxes). Nevertheless, the

computation of the Jacobians are extremely expensive and attempts to reduce it, such as linearization,

can lead to errors in the filter and instability. On the other hand, the UKF does not have the linearization

step and the associated errors of the EKF [27]. In Labbe [28] it is said that the UKF is more accurate

and typically more stable, in general, not particularly for the application.

4.4.3 Distributed Kalman Filter

Described in [22], this variation used in distributed architectures has to take two main things into account.

The algorithm would require the synchronization of the clocks of each source [29]. The synchronization

is typically achieved by using protocols that employ a shared global clock, such as the network time

protocol. Otherwise, the estimation would be quite inaccurate [30]. But if the estimations are consistent

and the cross-covariance is known and determined exactly (or the estimations are uncorrelated), then it

is possible to use the distributed KF [31].

41

4.4.4 Particle Filter

Particle filters are recursive implementations of the Monte Carlo sequential methods [32]. This method

builds the posterior density function using several random samples that are the so-called particles. Par-

ticles are propagated over time with a combination of sampling and resampling steps. In each iteration,

the sampling step is used to discard some particles, increasing the relevance of the regions most likely

to follow. In the filtering process, several particles of the same state variable are used, and each of them

has an associated weight indicating the quality of the particle. The estimation will be the result of the

weighted sum of all particles.

Therefore, there are two main steps in the PF algorithm: predicting and updating. In the first one,

each particle is modified according to the existing model and noise at a time instant and, in the second

one, the weight of each particle is reevaluated using the last sensor observation so the particles with

lower weights are removed. More in depth, the algorithm comprises the following tasks:

1. Initialization of the particles

2. Prediction step

3. Evaluate the particle weight

4. Select the particles with higher weights while removing those with lower weights and update the

state with the new particles

5. Propagate the result for the next time instant

This algorithm is more flexible than KF and it can deal with non-linearities as well as the EKF. It

also can face non-Gaussian densities in the dynamic model and in the noise error. Nevertheless, the

drawback resides in the large number of particles needed to obtain small variance in the estimator. In

addition, it is difficult to establish the optimal number of particles beforehand which has an impact on

the computational time since a great number of particles means a higher cost. There has been some

research [33] in using a dynamic number of particles instead of a fixed number as in earlier versions of

the algorithm.

4.4.5 Distributed Particle Filter

This algorithm, in the context of sensor fusion, tries to solve out-of-sequence measurements by regen-

erating the probability density function to the time instant of those measurements [34]. This uses to take

a lot of computational cost and needs a lot of space to store previous particles. Orton and Marrs [35]

proposed to store the information on the particles at each time to save the cost of recalculating that

information. Close to optimal, when the delay increases the effect on the result is minimal [36]. The

large amount of space to store the state of the particles at each time instant remains a problem.

42

4.4.6 Covariance Consistency Methods

These methods are used in distributed networks. Knowing cross variances is not a constraint in contrast

to distributed KF. They were proposed by Uhlmann [31] and they are general and fault-tolerant frame-

works for maintaining covariance means and estimations. There are two main methods: covariance

intersection and covariance union.

Covariance Intersection

This method was born to compute the cross variance X, that should be known exactly for a good

performance of the KF, in a computational time friendly way. Given two estimations to combine, (a,A)

and (b, B) ((mean, covariance)), a joint covariance M is defined as

M ≥

A X

XT B

(4.5)

The covariance intersection algorithm computes this joint covariance matrix M that could be used in

the KF equations that would provide the best fused estimation (c, C). The purpose of this method is to

provide a fused estimation with a lower associated uncertainty by generating the covariance matrix M .

But what it does differently from the KF? In the case of two estimations with equal covariance A = B,

KF would assume statistical independence and it would provide a fused estimation with covariance

C = (1/2)A. Covariance intersection does not assume this independence, being consistent even if the

estimations are completely correlated, and it will provide a fused covariance C = A. In case A ≤ B,

covariance intersection does not provide information about the estimation (b, B) and the fused estimation

would be (a,A).

The use of the covariance intersection algorithm guarantees consistency and non-divergence. Con-

sistency is guaranteed because every joint consistent covariance is enough to produce a fused esti-

mation, while choosing a measurement such as the determinant, minimized in each fusion operation,

provides a non-divergence criterion since the size of the estimated covariance would not be increased

according to this criterion. Nevertheless, the algorithm does not work properly when the measurements

to be fused are inconsistent (different estimations with high accuracy and small variance but a large

difference from the states of the others).

Covariance Union

Covariance union is proposed to solve the problem of inconsistent measurements. This problem

arises when the difference between the average estimations is greater than the covariance provided.

Inconsistent inputs can be detected when the Mahalanobis distance [37] between them, given in

Equation 4.6, is larger than a given threshold.

Md = (a− b)T (A+B)−1(a− b) (4.6)

43

A high Mahalanobis distance could indicate that the estimations are inconsistent, but the threshold needs

to be established by the user or learned automatically.

The covariance algorithm works as follows. Provided two observations, it is known that one of them

is correct and the other is not, but not which is each one. The observation needs to be updated with a

measurement consistent with both or the estimations for the KF to provide a consistent solution. Since

there is no way to determine which estimation is correct, it is necessary to provide a new estimation

(u, U) that is consistent with both estimations, following the next properties

U ≥ A+ (u− a)(u−A)T

U ≥ B + (u− b)(u−B)T(4.7)

where some measurement of U is minimized. One strategy is to assign the value of one measure-

ment to the mean of the new estimation, e.g. u = a, and then to chose the value of U that makes the

estimation consistent with the case in which the other measurement b was the correct. It is also possible

to assign to u a value between the two measurements a and b with the purpose to decrease the value

of U . Convex optimization algorithms must be employed to solve the inequalities in the Equations 4.7 in

order to obtain U , for instance, the iterative method described in [38].

After all, it is obtained a fused value u with the less covariance possible that makes the two measure-

ments consistent. If the obtained covariance is much larger than the initial ones, it reveals uncertainty

between the initial estimations. Note that this method is also easily extensible to more inputs.

4.5 Fusion Decision

These techniques aim to make a high-level inference about the events and activities that are produced

from the detected targets. These techniques use the knowledge of the perceived situation, provided

by many sources and in different ways depending on the data fusion application. It is the fusion itself.

The fusion process requires reasoning while taking into account the uncertainties and constraints of the

system. Algorithms used in this field to perform the decision-making are:

• Bayesian Methods

• Dempster-Shafer Inference

• Abductive Reasoning and Fuzzy Logics

• Semantic Methods

4.5.1 Bayesian Methods

The Bayesian inference is based on the Bayes rule

P (Y |X) =P (X|Y )P (Y )

P (X)(4.8)

44

where the posterior probability P (Y |X) represents the belief in the hypothesis Y given the information

X. It is a way to combine the data according to the probability theory rules, where uncertainties are

represented with the conditional probability terms describing beliefs on this data (from 0 to 1, 0 would

represent lack of belief and 1 total belief). The major drawback of the Bayesian inference is that the

probabilities P (X) and P (X|Y ) must be known or estimated (using Bayesian programming for the con-

ditional probabilities [39], for instance). In the Handbook of Multisensor Data Fusion [40], the following

problems are described regarding the Bayesian inference.

• Difficulty in establishing the value of a priori probabilities.

• Complexity when there are multiple potential hypotheses and a substantial number of events that

depend on the conditions.

• Hypothesis should be mutually exclusive.

• Difficulty in describing the uncertainty of the decisions.

4.5.2 Dempster-Shafer Inference

The Dempster-Shafer theory provides a formalism used to represent incomplete knowledge and combi-

nation of evidence and allows to represent uncertainty explicitly [41]. The mathematics of this method

can be found in both in Dempster [42] and Shafer [43].

With this method, a priori probabilities are not required because they are assigned at the instant the

information is provided, not like in the case of the Bayesian method. Some uses of it can be founded in

[44] and its extension [45], where information is fused in context-aware environments dynamically modi-

fying the associated weights to the sensor measurements, thus, it lets to calibrate the fusion mechanism

according to the recent sensor measurements.

4.5.3 Abductive Reasoning and Fuzzy Logics

The abduction method attempts to find the best explanation for an observed event. Thus, it is more

a reasoning pattern than a data fusion technique, for that reason it needs to be complemented with a

different inference method. It is then when fuzzy logic comes. Fuzzy logic provides an ideal tool for

inexact reasoning, particularly in rule-based systems and it has had some notable success in practical

application. It has been used in sensor fusion applications, e.g. for LIDAR and camera fusion in Zhao

et al. [46], where the following advantages of fuzzy logic are highlighted:

• It is built on top of the knowledge and experience of experts thus it can employ not only results

from the sensors but also a priori knowledge.

• It can model nonlinear functions of arbitrary complexity.

45

• It can tolerate imprecise results of two sensors.

• It is a flexible fusion framework, i.e., more sensors can be easily integrated into the system in the

future.

It works as follows. Given a universal set X with elements x and a subset A whit elements x that have

some properties, there is a membership function µ that assigns a value between 0 and 1 indicating the

degree of membership of every x to the subset A. Mathematically,

{X} ={x}

{A} = {x |x has some specific property}

A µA(x)→ [0, 1]

(4.9)

The fuzzy membership function needs to be quantified in some way. Figure 4.2, extracted from [47],

shows an example where X is a set with all aircraft and A is a set with fast aircraft.

Figure 4.2: Example fuzzy membership function [47]

In the context of fuzzy logics, the operation AND is implemented as a minimum, OR as a maximum and

NOT as a compliment.

A ∩ B µA∩B(x) = min[µA(x), µB(x)]

A ∪ B µA∪B(x) = max[µA(x), µB(x)]

B µB(x) = 1− µB(x)

(4.10)

Also, the properties associated with binary logic are held (commutativity, associativity, idempotence,

distributivity De Morgan’s law and absorption). The only exception is the law of the excluded middle,

46

which is no longer true.

A ∪A 6= X

A ∩A 6= ∅(4.11)

Thus, these definitions and laws provide a systematic way to reason about inexact values.

4.5.4 Semantic Methods

Semantic data models are models based on relationships between stored symbols and the real world.

In a semantic scheme, the outputs of the nodes that process the raw data are only the semantic in-

formation. Two phases are needed, one offline, that incorporates the most appropriate knowledge into

semantic information, and a second one in real time, in which relevant attributes are fused providing a

semantic interpretation of the sensor data. Another way to see it is that the data obtained from the en-

vironment by the sensor is translated into a formal language, which is compared with similar languages

stored in the database.

Therefore, the nodes do not need to transmit raw data but a formal language structure, which pro-

vides savings in the cost of transmission. Nevertheless, the set of behaviours must be stored in advance

and it might be difficult in some scenarios.

4.6 Other solutions

Other way to use the information of two or more sensors to enhance the performance of a single one is

presented in the schema in Figure 4.3.

Figure 4.3: Schema of a camara-based enhancement of the radar measurement

It is proposed a radar as a primary source of information. The radar has as an advantage among

LIDARs a longer detection range, nevertheless, it has a low resolution, i.e. in a given scenario the output

pixel is very large. As a secondary source, a camera. A camera is also a long range solution with high

47

resolution but it is more likely to produce a false detection due to a cluttered environment. Combining

both sensors like in Figure 4.3 helps to overcome the limitations of each sensor.

To avoid getting a false detection from the camera, its information is only employed when a mea-

surement from the radar is received. However, this radar measurement would not be as accurate as

desirable, then the camera helps to enhance it. Instead of running the algorithm for the camera obstacle

detection over all the scenario, it is run only in a region of interest created around the position measured

by the radar, reducing considerably the chances of a false alarm. Therefore, an accurate position of the

target will be obtained by the camera.

It was found that this was already used in some papers such as [48], thus it has been validated to

work.

48

Chapter 5

Machine and Deep Learning for

obstacle detection using visual

electro-optical sensors

When one looks at the sensors used in a SAA system, it is quickly identified that all the architectures

have almost a camera in their configurations, both as a secondary source of information or as a primary

source in novel systems. Thus, this chapter is dedicated to describe how to proceed in the field of image

processing for obstacle detection in a SAA application using machine and deep learning techniques.

The application is centered in evaluating if there is an obstacle in the clear sky, above the horizon.

Section 5.1 introduces the artificial intelligence to the reader while Section 5.2 and Section 5.3 explain

the basis of the techniques that are used, Convolutional Neural Networks and Support Vector Machine

respectively. Section 5.4 discusses the implementation of a Faster Region with Convolutional Neural

Networks for aircraft detection while Section 5.5 discusses the implementation of the Support Vector

Machine that together with an edge detection algorithm will lead to the obstacle detection. Finally,

Section 5.6 shows the results of the implementations.

5.1 Artificial Intelligence, Machine Learning and Deep Learning

Artificial intelligence (AI) was born with the idea to build complex systems with the human intelligence

characteristics. The idea of the first researchers in the field of AI was to build a general AI in which

machines and human beings have the same cognitive capacities, very ambitious and not reached yet.

But there is also another group, the narrow AI, where the current research on AI is. Here is where the

use, that is made through algorithms and guided learning with machine learning and deep learning,

comes in. Nowadays, the world is living a tremendous advance in the use of machine learning and,

49

since a few years ago, specifically deep learning, both included in the context of AI, as it has been said.

Figure 5.1: Artificial intelligence, machine learning and deep learning [49]

Machine learning, in its most basic use, is the practice of using algorithms to parse data, learn from

them and then be able to make a prediction or suggestion about something. The machine is trained

using a large amount of data allowing the algorithms to be perfected, instead of hand-coding software

routines with a specific set of instructions to accomplish a particular task. Regarding image processing,

computer vision has been one of the very best application areas for machine learning for many years,

where a well-trained machine could classify an image attending to some specific features.

Following the evolution of machine learning in the last decade, a specific machine learning technique

known as deep learning has spread more strongly. By definition, deep learning is a subset within the

field of machine learning, which preaches with the idea of learning from example. In deep learning,

instead of teaching a computer a huge list of rules to solve a problem, it is given a model that can

evaluate examples and a small collection of instructions to modify the model when errors occur. Over

time, it is expected that these models will be able to solve the problem extremely accurately, thanks to the

system’s ability to extract patterns. Although there are different techniques to implement deep learning,

one of the most common is to simulate a system of neuronal networks. Saving the distances, they

are inspired by the biological functioning of the human brain composed by the interconnection between

neurons. In this simplified case, the neuronal network is composed of different layers, connections and

a direction in which data is propagated through each layer with a specific analysis task. It is about

providing enough data to the neuron layers so that they can recognize patterns, classify and categorize

them. In the image processing field, an image can be taken as an input to the first layer. There, it will

be divided into thousands of pieces that each neuron will analyze separately. It can analyze the color,

the shape, etc. Each layer is an expert in a characteristic and assigns it a weight. Finally, the final layer

of neurons collects that information and gives a result. Each neuron assigns a weight to the input, as a

correct or incorrect result relative to its task. The output will be determined by the sum of these weights.

Table 5.1 presents some of the difference in using machine learning or deep learning.

50

Machine Learning Deep Learning

Data dependenciesExcellent performances on a

small/medium dataset

Excellent performance on a

big dataset

Hardware dependencies Works on a low-end machineRequires powerful machine,

with GPU

Feature engineeringNeed to understand the features

that represent the data

No need to understand the best

feature that represents the data

Interpretability

Some algorithms easy (logistic,

decision tree) other almost

impossible (SVM, XGBoost)

Difficult to impossible

Training dataset Small Large

Choose features Yes No

Number of algorithms Many Few

Training time Short Long

Table 5.1: Machine learning vs. deep learning

There are two problems when processing an image in this context: classification and detection. Clas-

sification algorithms assign a class to each pixel based on their properties. Object detection algorithms

not only classifies the input image but also draw bounding boxes around the object of interest to locate

it within the image.

In this all context and attending to the research being developed at CfAR, it is very interesting to use

machine learning or deep learning to process images captured by a camera, because, in contrast with

classical programming, it is not needed to establish all the clauses and threshold to classify the regions

of the image. Instead, the trained intelligent algorithm can detect the class from its knowledge, learning

which are these thresholds that lead to the classification, and also it is able to predict cases that the

programmer could have overlooked. Thus, two techniques that can help with this task are explained in

the next sections: convolutional neuronal networks (CNN) and support vector machine (SVM).

5.2 Convolutional Neural Networks

Within the computer vision, deep learning approaches are one of the most popular ways to analyze visual

imagery. CNN or ConvNet is a class of deep neural networks that have been winning the Imagenet Large

Scale Visual Recognition Challenge since 2012 [50]. It is difficult to understand how a CNN works: it

passes the image to the network where it is sent through several convolutions and pooling layers and,

finally, the output is obtained in the form of the object’s class.

51

Figure 5.2: Convolutional Neural Networks [51]

There are different algorithms that could use these networks to detect objects. They are going to be

explained in the next sections. The amazing thing about these techniques is that they can perform the

feature extraction, the classification and the object detection all in, thus, sending the image (and the

pre-trained data) to the algorithm gives the result directly.

5.2.1 Regions with CNN features

When processing an image where the number of objects of interest is unknown, to take different regions

of interest and use a CNN to classify the presence of the object within that region could be a solution.

But these objects can have different aspect ratios and different spatial locations, thus the number of

regions needed would make it computationally unapproachable. Regions with CNN features (R-CNN)

was proposed by Girshick et al. [52] to address this problem.

R-CNN combines proposals of rectangular regions with characteristics of CNN. It reduces the prob-

lem to find 2000 regions in the image that might contain an object. These regions are called region

proposals and they are selected by (1) generating an initial sub-segmentation that generates candidate

regions, (2) using a greedy algorithm to combine similar regions into larger ones recursively and (3)

using the generated regions to produce the final candidate region proposals. In each region (reshaped),

CNN acts as a feature extractor, and then these features are fed into a SVM to divide these regions

into different classes. Finally, a bounding box regression is used to predict the bounding boxes for each

identified region.

Figure 5.3: Regions with CNN features [52]

Nevertheless, it stills taking a lot of time to train the networks since it has to classify 2000 region

proposals per training image. Also, the detection takes around 47 seconds [52] per test image, which

52

makes it not suitable for a real time application. Last, the selective search algorithm is not an intelligent

algorithm thus it could lead to propose bad candidate regions.

5.2.2 Fast R-CNN

The same author proposed in [53] a solution to overcome the speed limitations of the R-CNN. Instead

of feed the region proposals to the CNN, the fast R-CNN feeds the entire input image to the network to

generate a convolutional feature map. It is there where the region proposals are identified. Then, they

go through a region of interest pooling layer that reshapes them into a fixed size in order to pass the

regions to a fully connected network that classifies them and returns the bounding boxes using softmax

and linear regression layers simultaneously.

Figure 5.4: Fast R-CNN [53]

Fast R-CNN is significantly faster than R-CNN: the ConvNet is fed with one region per image instead

of 2000. But the definition of the region proposals using selective search stills being a bottleneck, slowing

down the process when compared to not using region proposals, as it is shown in Figure 5.5 (values

extracted from [53]).

(a) (b)

Figure 5.5: Comparison of R-CNN and fast R-CNN

53

5.2.3 Faster R-CNN

Ren et al. [54] developed the faster R-CNN that instead of using a selective search algorithm on the

feature map to identify the region proposals, a separate network is used to predict the region proposals.

It is called the region proposal network (RPN). The algorithm is one order of magnitude faster with this

approach [54].

(a) Faster R-CNN (b) RPN

Figure 5.6: Faster R-CNN and RPN [54]

Figure 5.6(a) describes how the algorithm works while Figure 5.6(b) describes how the part of the

RPN works. Faster R-CNN takes the feature maps from the CNN and passes them into the RPN, where,

using a sliding window over these feature maps, it generates k anchor boxes (fixed sized boundary boxes

that are placed throughout the image and have different shapes and sizes) for each window. For each

anchor box, RPN predicts the probability that an anchor is an object (without considering any class) and

the bounding box regressor is used to adjust the anchors to better fit the object. Then, the region of

interest pooling layer extracts fixed size feature maps for each anchor, which are sent to a fully connected

layer with softmax and linear regression layer, as well as in fast R-CNN, and finally it will come up with

the classification of the object and the prediction for the bounding boxes for the identified objects.

5.3 Support Vector Machine

SVM is a well-known supervised machine learning algorithm, typically used in binary classification prob-

lems (in fact, it is used by the R-CNN to perform the classification part). When running this algorithm in

an image, it can classify each pixel into the categories defined during the training, however, extra code

is needed to perform the detection. It plots each data item as a point in a n-dimensional space (n being

the number of features that define a class). The classification is performed by finding the hyperplane

that differentiates two classes: data points falling on either side of the hyperplane can be attributed to

different classes. Between all the possible hyperplanes, the algorithm would find the one that has the

maximum margin (maximum distance between data points of both classes), which also provides some

confidence when classifying future data points.

54

Figure 5.7: SVM classification (adapted from [55])

The support vectors are the points that are closer to the hyperplane and help to build the SVM by

influencing the position and orientation of it and maximizing the margin of the classifier (the support

vectors are the points highlighted in Figure 5.7).

The hyperplane depends on the number of features. For two features the hyperplane is a line, for

three is a two-dimensional plane and so on. In addition, for a huge dataset, the resulting separation

hyperplane is no longer that straight and classification becomes more difficult since it is not that easy

to evaluate if a point is on one side or another of the hyperplane. For that reason, a trade-off between

accuracy and computational time has to be made.

5.4 Implementation of Faster R-CNN for aircraft detection

Among the algorithms that use CNN for image classification, the faster R-CNN is, as it is indicated by

its name, not only the faster one but also the most accurate and thus the most likely to produce good

results in an onboard system. For those reasons, it is the one used. The implementation is done in

Matlab. The goal with the implementation of this technique is to detect aircraft in the sensor image.

For that, the algorithm has to be trained for the object class that it has to detect. Thus, in the end, it

would be able to detect that class in the image. The training is done by means of the Matlab function

trainFasterRCNNObjectDetector, which has three main inputs: trainingData, network and options.

To build the trainingData, it is needed to find a good dataset of images with whom training the net-

work. After trying two different datasets with bad results, a third one was found. This dataset has been

used in the PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning) VOC (Visual

Object Classes) project [56], that has been running challenges to evaluate performance on object class

recognition from 2005 to 2012. In the 2012 edition, the dataset contains 17125 images from 20 different

classes, being aeroplane one of them. To each image corresponds a XML file where, among other

information, there are the classes that contain that image and the region of the image where they are

located. This is an important thing when training a CNN: it is necessary to know in which region of the

training image is the class object located, since there may be other objects in the image that are not of

interest. These regions are called bounding boxes. Therefore, to train a detector for aircraft, it has been

created an algorithm that searches in every file for the aeroplane class and extracts the bounding box

55

information of the object. After all, it is created a table with two columns where the first column contains

paths and file names to the images and the second column contains the bounding boxes related to the

corresponding image. Each bounding box must be in the format [x y width height]. The format specifies

the upper-left corner location and size of the object in the corresponding image. The table variable name

defines the object class name. In total, there were 960 images with the class aeroplane.

The Faster R-CNN object detection network is composed of a feature extraction network followed

by two sub-networks. The first sub-network following the feature extraction network is a RPN trained

to generate object proposals (object or background). The second sub-network is trained to predict the

actual class of each proposal. Matlab has some deep neural networks to use for feature extraction. It

has been decided to use ResNet-18 which makes a good trade-off between accuracy and computational

time. The images sent to the detector will need to have a minimum resolution of 224x224. Results in

terms of accuracy and prediction time for the different networks will be shown further in this section.

Regarding the options, they have been selected after different trainings trying to optimize both ac-

curacy and computational time during the classification and detection. To highlight, the solver uses

stochastic gradient descent with momentum optimizer, the initial learning rate is set to 10−3 (if the learn-

ing rate is too low, then training takes a long time but if the learning rate is too high, then training might

reach a suboptimal result or diverge). The maximum number of epochs, which is the full pass of the

training algorithm over the entire training set, is set to 30 in order to make the detector robust enough

without making it too slow to train (it has been tested that to double this value, i.e., 60, only produces a

very small difference in the results). And the mini-batch size, which is a subset of the training set that is

used to evaluate the gradient of the loss function and update the weights, is set to 1, which in turn allows

for different sizes of the image in the training dataset preventing them from being batched together for

processing.

Before running the training, 99 of the dataset images have been reserved for testing the detector’s

performance. After training, a detector file is generated which, used with the image in which it is wanted

to perform the detection by means of the Matlab function detect, provides as an output the boxes in

which the class of interest is located, in this case the aircraft, and a value of confidence in that detection.

The detection threshold has been set, based on the experimentation, to 0,2, i.e., it would need a less

confidence to provide a detected obstacle, since a false positive is better than a false negative in this

application. The testing images have been used to determine the accuracy and computational time of

each detector.

To evaluate the accuracy, the precision/recall curve highlights how precise a detector is at varying

levels of recall. Precision (or positive predicted value) is the fraction of true positives tp (from the results

identified as positive, the ones that actually are) and true positives and false positives fp (results identified

as positives that are not). Recall (or sensitivity) is the fraction of true positives and true positives and

false negatives fn (results identified as negatives that are positives). Ideally, the precision would be 1 at

56

all recall levels.

Precision =tp

tp+ fp=

(relevant elements) ∩ (selected elements)

selected elements

Recall =tp

tp+ fn=

(relevant elements) ∩ (selected elements)

relevant elements

(5.1)

Where the relevant elements are all the black points in the green side (not only the circle) in Figure 5.8

and selected elements the elements within the circle. A threshold of 0,5 has been chosen to determine

the percentage of overlapping area between the ground truth data and the detection that is considered

as a success.

Figure 5.8: Relevant elements and selected elements

To evaluate the prediction time, the detector is run several times for each one of the test images and

then the mean is computed. The characteristics of the computer where the results are obtained from

are AMD Ryzen 5 2600X Six-Core (3.60 GHz) CPU, 16 GB RAM and NVIDIA Quadpro P2000 GPU.

Consider that the times will be reduced if run in another language like C.

Figures 5.8 and 5.9 show the results where the precision for each network is represented in the first

column and the prediction time in the second.

57

(a) GoogleNet Accuracy (b) GoogleNet Prediction time

(c) ResNet-18 Accuracy (d) ResNet-18 Prediction time

(e) MobileNet-v2 Accuracy (f) MobileNet-v2 Prediction time

Figure 5.8: Accuracies and prediction times for different networks (I)

58

(g) ResNet-50 Accuracy (h) ResNet-50 Prediction time

(i) ResNet-101 Accuracy (j) ResNet-101 Prediction time

(k) Inception-v3 Accuracy (l) Inception-v3 Prediction time

Figure 5.9: Accuracies and prediction times for different networks (II)

ResNet-18 network has the best trade-off between accuracy and time and for that reason is the one

selected for the implementation, since its prediction time allows for a less time-consuming implementa-

tion, the accuracy is 6,51 % less than the one achieved with the most accurate network, nevertheless, it

takes 35,28 % of its time.

Finally, the implementation of this method will return an easily interpretable array of size

numberObstacles + 4 · numberObstacles, being the first element the number of obstacles, the next four

the X coordinate and Y coordinate of the top-left corner of the bounding box, the height and the width of

that bounding box, respectively, and so on until reaching the total number of obstacles.

59

5.5 Implementation of SVM and edge detection for obstacle detec-

tion above the horizon

An edge detection algorithm is a very fast way to detect objects, nevertheless, it can suffer a lot from

ground clutter and thus produce a lot of false detections. There is where the SVM comes in. The machine

learning technique is used for classifying both ground and sky and then the edge detection algorithm

will work only in the sky part of the image. Therefore, there are two main steps in the implementation,

that has been done through a function where the input is the video frame and the output is an array

containing the number of obstacles, defined as spheres, and thus their centroids and radius.

5.5.1 Sky segmentation

To separate sky and ground, a SVM is trained for classifying these two classes and then the line that

best separates them, i.e. the horizon line, will be found. The idea of using SVM to detect the horizon line

was introduced in [57] but with others purposes, thus, it has been adapted for the current application.

To perform the training it was created a dataset of 202 images of each class. Nevertheless, with

such an amount of images, the file created after the training was too heavy which made the classification

computationally expensive. Thus, the results that will be shown are built with a dataset of 29 images of

each class, that provides also a good performance in the major part of the tested images and reduces

the computational time one order of magnitude.

In the case of machine learning is the programmer the one who establishes the features that will lead

to the classification, then, to select them is the first step of the training algorithm. The features selected

are:

• Hue: value from 0 to 1 that corresponds to the color’s position on a color wheel. As hue increases

from 0 to 1, the color transitions from red to orange, yellow, green, cyan, blue, magenta, and finally

back to red.

• Saturation: amount of hue or departure from neutral. 0 indicates a neutral shade, whereas 1

indicates maximum saturation.

• Maximum value among the red, green, and blue components of a specific color.

• Range value (maximum value to minimum value) of the 3 by 3 neighborhood around the corre-

sponding pixel in the input image.

• Standard deviation of the 3 by 3 neighborhood around the corresponding input pixel.

• Entropy value of the 9 by 9 neighborhood around the corresponding pixel in the input image.

The feature extraction is done for both sky and ground images and then two giant matrices where

each column represents one feature are built. This matrix is then sent to the Matlab function fitcsvm

60

which performs the training. The gaussian or radial basis function kernel worked the best for this prob-

lem. Also, sequential minimal optimization is used. This function returns a struct with the information

needed to classify future pictures. This struct will be loaded in the first part of the implementation.

Results of the training are shown in Figure 5.10.

(a) Sky (b) Ground

Figure 5.10: Sky and ground features

In the first part of the implementation, from a video frame, a function in charge to perform the sky

segmentation will return the slope m, the intercept b and the R2 of the horizon line y = mx+b. Inside this

function, first, the image is scaled down to save memory and computational time. The same features

used in the training are extracted and through the Matlab function predict, that uses the trained data, the

segmentation is performed. Then every pixel is classified either as 1 (ground) or as 0 (sky) and is brought

back to the original picture representation obtaining a binary image of the environment. The image

is smoothed with some filters and sent to the Matlab function edge using the Prewitt edge detection

algorithm, that calculates the gradient of the image intensity at each point, giving the direction of the

largest possible increase from light to dark and the rate of change in that direction. The result shows

how abruptly or smoothly the image changes at that point, and therefore how likely it is that part of the

image to represent an edge, as well as the most likely orientation of that edge. Thus, the horizon line is

extracted. A mask is applied to remove border pixels. After that, a function uses the extracted horizon

line and tries to fit a first order polynomial into it, estimating also the accuracy by taking into account how

good the line fit was. i.e. it is obtained the slope, the intercept and the R2. If there is no horizon in the

frame, the value of the R2 will be of the type NaN, and it will be used in future steps to determine if it is

necessary to remove the ground from the picture.

61

5.5.2 Obstacle detection

The obstacle detection is performed by detecting the edges of them through the edge Matlab function,

which results in a binary image with the edge pixels of the object candidates to be an obstacle with the

value of 1 and the rest 0. The objects are filled and then, if there is ground in the image, determined

by the R2 as said, it is removed by a function that uses the output from the sky segmentation function.

Therefore, the part of the image below the horizon line is not considered for the detection.

After that, a filter based on the area of the pixel groups (ones between zeros) is used to remove

isolated pixels thus the probability of false detection is less, establishing a threshold for that.

With the filtered image, the different group of pixels found in the image will determine the number

of obstacles. For each group, the mean of the coordinates they occupy in the image will determine

the centroid of that obstacle and the length of the major axis will determine the diameter. This data

is collected in an easily interpretable array of size numberObstacles + 3 · numberObstacles, being the

first element the number of obstacles, the next three the X coordinate, Y coordinate and radius of the

obstacle, respectively, and so on until reaching the total number of obstacles.

5.6 Results

In this section, both implementations are run over some of the images in the data base reserved for

testing and the results are presented. Representatives cases have been selected in order to extract good

conclusions about the advantages and limitations of each method. The obstacle detected is labeled in

the case of the faster R-CNN with the confidence value in the detection. Computational time (obtained

by running it several times and computing the mean) is annotated in the caption of each figure. In

Figures 5.11, 5.12 and 5.13, first column shows the results for the faster R-CNN implementation while

the second shows the results for the SVM + edge detection implementation. Again, the characteristics of

the computer where the results are obtained from are AMD Ryzen 5 2600X Six-Core (3.60 GHz) CPU,

16 GB RAM and NVIDIA Quadpro P2000 GPU.

The images in Figure 5.11 shows some results in different generic scenarios.

But what happens if the obstacles are too close? Figure 5.12 shows the results in this case, where

the implementation of SVM + edge detection algorithm is most likely to struggle. Data association will

be needed to determine which detections correspond to each target.

On the contrary, if they are too far, shown in Figure 5.13, the faster R-CNN does not get the same

amount of confidence when detecting and even it can not distinguish the properties of an aircraft in some

cases and thus the detection fails.

62

(a) Average time = 0,4680 s (b) Average time = 7,0793 s

(c) Average time = 0,3856 s (d) Average time = 6,9824 s

(e) Average time = 0,3598 s (f) Average time = 7,4521 s

Figure 5.11: Results and detection time for different images

63




Figure 5.12: Results and detection time for close obstacles

64




Figure 5.13: Results and detection time for far away obstacles

65

The implementation of the SVM + edge detection is not only meant to detect aircraft but also any ob-

stacle above the horizon line. Thus, Figure 5.14 shows different results in those cases when something

else is out there, such as a tree or a building. In order to detect those obstacles with the faster R-CNN

algorithm, it would need to be trained for those classes (tree or building, for instance), therefore, it could

be extended.

(a) Tree detections (b) Building detections

Figure 5.14: Other obstacle detection with the SVM + edge implementation

Table 5.2 summarizes the main characteristics of each implementation.

Faster R-CNN SVM + edge detection

Average Time [s] 0,3623 7,1441

Range Short - medium Medium - long

Other obstacles No (needs to be trained for more classes) Yes (only above the horizon line)

Table 5.2: Comparison of both implementations

Therefore, the next conclusion can be reached: to use the implementation of the faster R-CNN as

the main resource for aircraft detection, since it is faster and provides better results at short ranges, and

to use the implementation of the SVM + edge detection to be run time to time and warn the system of

distant threats. Nevertheless, other configurations can be implemented:

• A faster R-CNN trained for different types of obstacles will overcome the limitation of only detecting

aircraft, but it will still perform detections in short-medium range if the time is not compromised.

• An edge detection algorithm (without the SVM capability to perform sky segmentation) if the cam-

era is always pointing to the clear sky, that will overcome the time limitation, performing detections

in a range of 0,1 to 0,2 seconds.

66

Chapter 6

Simulation environment

To test a functional SAA system, a physical robot needs to be assembled and programmed. In the

Enhanced Guidance, Navigation and Control for Autonomous Air Systems based on Deep Learning and

Artificial Intelligence project developed at CfAR, the robot shall be a quadrotor UAV. In this chapter,

Section 6.1 presents the chosen way to control the robot: Robot Operating System (ROS). However,

real robots require logistics including lab space, recharging of batteries, operation licensing and have

the risk of damaging both itself and people/property around it. Thus, it is a smart approach to simulate

a robot and validate the algorithms by using simulators. Section 6.2 presents the simulator chosen to

test the UAS: Gazebo. Finally, Section 6.3 shows how all of this can be used to test the algorithms

implemented in Chapter 5.

This chapter aims to provide some insight into both the testing and simulation strategies that will be

used to validate the SAA system.

6.1 ROS

ROS is an open source robotic software system that can be used without licensing fees. By being an

open source software, there is access to the source code and one can modify it according to their needs,

being possible to contribute to the software by adding new modules. One of the main purposes of ROS is

to provide communication between the user, the computer’s operating system and the external hardware

(sensors, robots, etc). One of the benefits of ROS is the hardware abstraction and its ability to control a

robot without the user having to know all of the details of the robot. For instance, to move a robot around,

a ROS command can be issued, or custom scripts in Python or C++ can cause the robot to respond as

commanded. The scripts can, in turn, call various control programs that cause the actual motion of the

robot respecting its kinematics and dynamics limitations.

Essentially, ROS is a framework for writing robot software, having already a collection of tools, li-

braries and conventions that aim to simplify the task of creating complex and robust robot behavior

67

across a wide variety of robotic platforms [58]. The software is structured as a large number of small

programs that rapidly pass messages between each other. Creating a robust robot software is a difficult

task and ROS was built from the ground up to encourage collaborative robotics software development.

This way, different organizations can make use of each other expertise, allowing faster development of

complex robots.

6.1.1 ROS Graph

One of the original problems that motivated the development of ROS was a ”fetching robot” case. The

solving of this problem led to several observations about many robotics software applications, which

became some of the design goals of ROS [59]:

• The application task can be decomposed into many independent subsystems (e.g. navigation,

computer vision, grasping, etc.);

• These subsystems can be used for other tasks (e.g. security patrol, cleaning, delivering mail, etc.);

• With proper hardware and geometry abstraction layers, the vast majority of application software

can run on any robot.

The design goals can be illustrated by the fundamental rendering of a ROS system: its graph. A

simple ROS graph is represented in Figure 6.1, where the ellipsis represents nodes and the arrows

represent message streams between nodes; i.e. the edges.

Figure 6.1: Example of a ROS graph

A ROS system is made up of a variety of programs, called nodes, running simultaneously and com-

municating with one another via messages. The ROS design idea is that each node is an independent

module that interacts with other node using ROS communications capabilities. For that purpose, ROS

makes use of roscore so nodes can find each other and transmit messages. Every node connects to

roscore at startup to register details of the message streams it publishes and the streams it wishes to

subscribe. Messages between nodes are transmitted peer-to-peer and roscore is only used by nodes

to know where to find their peers.

68

6.1.2 ROS Packages, ROS Topics and rosrun

ROS is organized into packages which contain a combination of data, code and documentation. The

ROS ecosystem includes thousands of publicly available packages in open repositories, encouraging its

use between interested parties.

Although a ROS system consists of a number of independent nodes forming a graph, these nodes by

themselves are typically not very useful. The nodes become relevant when they start communicating,

exchanging information and data. The most common way to do that in ROS is through topics, which

consist of a stream of messages with a defined type. In ROS all messages on the same topic must be of

the same data type. A good practice is to name it in a way that describes the messages that are being

transmitted.

Topics implement a publish/subscribe communication mechanism, which is one of the most common

ways to exchange data in a distributed system. Before nodes start to transmit data over topics, they

must first advertise, i.e. announce, both the topic name and the types of messages that are going to be

sent. Then, they can start to publish, i.e. send, the actual data on the topic. Nodes that want to receive

messages on a topic can subscribe to that topic by requesting roscore.

Since ROS has a sparse community, its software is organized in packages that are independently

developed by community members. Chasing down long paths towards the packages in order to execute

functions would become tiresome in large file systems since nodes can be deeply buried in large direc-

tory hierarchies. To automate this task, ROS provides a command-line utility, called rosrun, that will

search a package for the requested program and pass it any parameters supplied.

6.2 Gazebo

As said in the introductory note, a lot of the trouble related to operate real robot can be avoided by

using simulated robots. Although, at first glance, this seems to defeat the whole purpose of robotics,

software robots are extraordinarily useful. In simulations, it can be model as much or as little of reality

as desired. Sensors and actuators can be modeled as ideal devices or can incorporate various levels of

distortion, errors and unexpected faults. Also, due to the nature of the final system developed at CfAR

(path planning and control algorithms), it typically requires simulated robots to test the algorithms under

realistic scenarios in a safe way.

Thanks to the isolation level provided by the messaging interfaces of ROS, the majority of the robot’s

software graph can be run identically whether it is controlling a real robot or a simulated one. The various

nodes are launched and simply find one another and establish connections. Simulation input and output

streams connect to the graph in the place of the device drivers of the real robot. Although some tuning

is often required when transitioning to a physical robot, ideally the structure of the software will be the

same, and often the simulation can be modified to reduce the number of parameter tweaks required

69

when transitioning between simulation and reality [58].

The chosen simulator for this project was Gazebo, which is a 3D physics simulator based on rigid-

body dynamics, allowing for a good computational performance. Gazebo is capable of real-time sim-

ulation with relatively simple worlds and robots and, with some care, can produce physically plausible

behavior.

ROS integrates closely with Gazebo through the gazebo_ros package, which provides a Gazebo

plugin module that allows bidirectional communication between Gazebo and ROS. Simulated sensor

and physics data can stream from Gazebo to ROS and actuator commands can stream from ROS back

to Gazebo. In fact, by choosing consistent names and data types for the data streams, it is possible for

Gazebo to exactly match the ROS API of a robot. When this is achieved, all of the robot software above

the device-driver level can be run identically both on the real robot, and (after parameter tuning) in the

simulator.

Figure 6.2: Prototyping

For the current application, Gazebo and the availability of ROS packages allow the integration of

different sensors into a quadrotor model and an easy access to their outputs. Figure 6.3(a) shows a

simple model of a quadrotor carrying an optical camera and a LIDAR, spawned inside an office. Also, it

is compared what the operator perceives of the environment in Figure 6.3(b) (walls and free passages)

with what the robot senses through the LIDAR in Figure 6.3(c) (an obstacle which reflected the light

identified in red, or a path with no obstacle). Figure 6.3(d) shows the access to the hardware information,

in this case, the image from a visual camera.

70

(a) Gazebo scenario with a default quadrotor

(b) Upper view of the environment (c) LIDAR-mapped environment

(d) Image received from the visual camera

Figure 6.3: Quadrotor carrying an optical camera and a LIDAR and their information

6.3 Simulation

In order to test the capability of the UAS to acquire pictures of the environment and classify them regard-

ing its obstacles, a simulation in Gazebo was conducted. In order to set the simulation, a few parameters

need to be set:

• A world with obstacles.

• An UAV with a visual camera.

• A script to save and analyze the data collected.

71

6.3.1 World

A simple environment was set for simulation purposes, mainly due to the fact of the computational

resources needed to run Gazebo. A simple scenario with blue sky and grass ground was firstly defined,

to which some obstacles were added. Said obstacles encompassed a set of pine trees already modeled

within the Gazebo Graphical User Interface (GUI). Regarding the airborne obstacles, an online model

was used [60]. This model was selected based on its resemblance with a commercial aircraft while

having smaller dimensions, making the simulation run smoother. Nevertheless, it could be improved

choosing one more similar to commercial aircraft. Two of these UAVs were spawned and set to describe

a lift-off-like maneuver in a loop. An example of the world is depicted in Figure 6.4.

Figure 6.4: Simulation environment

6.3.2 UAV

To simulate the UAV, a simple CAD model was developed and converted to a file type that Gazebo can

interpret as a solid body subjected to the laws of physics. In order to collect pictures of the environment,

a camera model was also implemented with the characteristics listed in Table 6.1. The camera takes 30

pictures per second composed of squared pixels. Noise is modeled as a Gaussian process with zero

mean and standard deviation of 0,007. The UAV used in the simulation is represented in Figure 6.5.

Parameter Value

Update rate [fps] 30

Resolution [pixels x pixels] 500 x 500

Pixel format R8G8B8

Field of View [◦ ] 114,6

Range [m] 200

Table 6.1: Camera characteristics

72

Figure 6.5: Simulated UAV

6.3.3 Data collection

The images collected by the camera are published in the image raw ROS topic. However, to analyze the

pictures, it is necessary to convert them to a suitable format rather than a ROS message. To accomplish

that, the open source computer vision library OpenCV [61] is used. The images taken were saved in

a specific folder in order to be classified by the implemented algorithms. It was also possible to record

multiple frames, save them and play them back as a video by using rosbag.

Some results of the implementations developed in Chapter 5 running over the images collected from

this simulation environment are shown in Figure 6.6.

(a) Faster R-CNN on the simulation environment (b) SVM + edge detection on the simulation environment

Figure 6.6: Algorithms run over images collected from the simulation environment

It can be seen that although the intruder UAV model is not the most similar to a commercial aircraft,

they are detected in both cases, as well as the tops of the trees overlooking the horizon in the case of

the SVM and edge detection implementation, thus proving Gazebo’s ability to be the platform where to

test the developed algorithms.

73

Chapter 7

Conclusion

This document enfolds the research done for the Enhanced Guidance, Navigation and Control for Au-

tonomous Air Systems based on Deep Learning and Artificial Intelligence project. In Chapter 2, the most

important concepts and theoretical aspects of SAA were mentioned and an architecture for the project

was proposed. Chapter 3 and Chapter 4 explain the basis of the different non-cooperative sensors and

data fusion algorithms respectively, pointing the aspects to take into account depending on the final ap-

plication. The implementation of machine and deep learning to provide a detection capability with visual

cameras is rigorously discussed in Chapter 5, including a comparison of both techniques. Finally, Chap-

ter 6 discusses the most important aspects of the simulation environment that will be used to test all the

algorithms within the project, including an example of an environment used to test the image processing

algorithms developed in this thesis. All of which makes a good start to the project.

7.1 Achievements

The definition and direction of the project are now clear.

Among the different non-cooperative sensors, three are particularly suitable for the project: radars,

LIDARs and visual EO cameras. Some guidance is given in when to use each one. Combinations

between radar and camera for long range operations or LIDAR and camera for short range operations

are desirable since they can complement each other’s limitations. An interesting approach to the camera

and radar combination was presented in Section 4.6, where the radar limitation in terms of resolution

is complemented by the camera, while allowing a low computational cost since the block in charge of

processing the information provided by the camera does not have to work on the entire image.

Chapter 4, dedicated to the data fusion, is a complete guide for the future researcher, addressing the

steps to follow and the aspects to take into account for the application within the frame of this project.

A good choice of algorithms could be k-means for data association, EKF for state estimation and fuzzy

logic to make the decision.

75

Related to the experimental work carried out in Chapter 5, on one hand, it has been proved that

using machine learning is a good technique to remove the ground clutter in the image, nevertheless the

algorithm stills being computationally heavy. On the other hand, deep learning techniques show a good

performance to detect aircraft plus it can be trained for other obstacle categories. A thorough study in

which parameters to use to ensure a properly trained detector has been accomplished in order to obtain

the best results. In terms of the results, the implementation of the faster R-CNN is a robust solution

for short range operations while the implementation of the SVM and edge detection algorithm obtains

better results in long range although it takes more time to compute, thus, a time to time scan with this

implementation could be an option while using the faster R-CNN as a primary source of information.

Finally, a simulation environment is being prepared. ROS and Gazebo are a good frame for robot op-

erations and simulations in the context of the project, thanks to the possibility of extending the algorithms

tested there to a real robot and the easy integration of different sensing devices, such as a camera or a

LIDAR.

7.2 Future Work

The most immediate activity recommended as future work regards finishing the experimental work by ex-

tending the Matlab algorithms to be used with ROS and Gazebo in real time. Using sockets or compiling

the code in Python or C++ can serve to achieve this purpose.

Regarding the faster R-CNN implementation, it would be desirable to extend it to other classes of

obstacles, such as birds, trees or buildings. For the SVM and edge detection implementation, it would

be interesting trying to achieve a faster performance by tuning the parameters used to train the SVM.

The next step on the project would be to test other sensors such as a LIDAR or a radar to figure

out how to process their information and, once done it, follow the advices in Chapter 4 to perform data

fusion with the information given by the camera processing carried out in this thesis.

Eventually, the information of the final state of the obstacles obtained after the data fusion will be

sent to the online path planning algorithm, where a trajectory to avoid the obstacles is generated and

optimized. Integrating the navigation sensors’ information within this framework would be desirable as

well because the final goal of the Enhanced Guidance, Navigation and Control for Autonomous Air

Systems based on Deep Learning and Artificial Intelligence project is to design an effective SAA system

for real UAS and thus all this information would be sent to the flight management system that would

send the flight plan to the autopilot. Therefore, work in this area could be also done.

76

Bibliography

[1] S. Waterman. Drones over u.s. get ok by congress. The Washington Times,

https://www.washingtontimes.com/news/2012/feb/7/coming-to-a-sky-near-you/, February 2012.

[2] P. Angelov. Sense and Avoid in UAS. Wiley, 1st edition, 2012. ISBN:978-0-470-97975-4.

[3] G. Fasano, D. Accardo, A. Moccia, and D. Moroney. Sense and avoid for unmanned air-

craft systems. IEEE Aerospace and Electronic Systems Magazine, 31:82–110, 11 2016. doi:

10.1109/MAES.2016.160116.

[4] F. S. Team. Airspace, special use airspace and tfrs. Federal Aviation Administration,

https://www.faasafety.gov/gslac/ALC/course content.aspx?cID=42&sID=505.

[5] sUAS Aviation Rulemaking Committee. Comprehensive set of recommendations for sUAS regula-

tory development. Apr. 2009.

[6] S. Ramasamy, R. Sabatini, and A. Gardi. Avionics sensor fusion for small size unmanned aircraft

sense-and-avoid. In 2014 IEEE Metrology for Aerospace (MetroAeroSpace), pages 271–276, May

2014. doi: 10.1109/MetroAeroSpace.2014.6865933.

[7] R. Arteaga. Automatic dependent surveillance broadcast ads-b sense-and-avoid system. NASA,

https://ntrs.nasa.gov/search.jsp?R=20160007776, June 2013.

[8] C. T. Howell, T. M. Stock, P. J. Wehner, and H. A. Verstynen. Simulation and flight test capability

for testing prototype sense and avoid system elements. In 2012 IEEE/AIAA 31st Digital Avionics

Systems Conference (DASC), pages 8A1–1–8A1–16, Oct 2012. doi: 10.1109/DASC.2012.6382430.

[9] F. Castanedo. A review of data fusion techniques. TheScientificWorldJournal, 2013:704504, 10

2013. doi: 10.1155/2013/704504.

[10] M. McLean. What polar pattern should i record my podcast with? The Podcast Host,

https://www.thepodcasthost.com/equipment/microphone-polar-patterns/, July 2016.

[11] G. Fasano, D. Accardo, A. Moccia, G. Carbone, U. Ciniglio, F. Corraro, and S. Luongo. Multisensor

based fully autonomous non-cooperative collision avoidance system for uavs. Journal of Aerospace

Computing, Information and Communication, 5:338–360, 10 2008. doi: 10.2514/1.35145.

77

[12] J. A. Jackson, J. D. Boskovic, and D. Diel. Sensor fusion for sense and avoid for small uas without

ads-b. In 2015 International Conference on Unmanned Aircraft Systems (ICUAS), pages 784–793,

June 2015. doi: 10.1109/ICUAS.2015.7152362.

[13] H. F. Durrant-Whyte. Sensor models and multisensor integration. I. J. Robotic Res., 7:97–113, 12

1988. doi: 10.1177/027836498800700608.

[14] B. V. Dasarathy. Sensor fusion potential exploitation-innovative architectures and illustrative appli-

cations. Proceedings of the IEEE, 85(1):24–38, Jan 1997. ISSN 0018-9219. doi: 10.1109/5.554206.

[15] R. C. Luo, Chih-Chen Yih, and Kuo Lan Su. Multisensor fusion and integration: approaches,

applications, and future research directions. IEEE Sensors Journal, 2(2):107–119, April 2002.

ISSN 1530-437X. doi: 10.1109/JSEN.2002.1000251.

[16] Huimin Chen, T. Kirubarajan, and Y. Bar-Shalom. Performance limits of track-to-track fusion versus

centralized estimation: theory and application [sensor fusion]. IEEE Transactions on Aerospace

and Electronic Systems, 39(2):386–400, April 2003. ISSN 0018-9251. doi: 10.1109/TAES.2003.

1207252.

[17] Y. Bar-Shalom and E. Tse. Tracking in a cluttered environment with probabilistic data association.

Automatica, 11(5):451 – 460, 1975. ISSN 0005-1098. doi: https://doi.org/10.1016/0005-1098(75)

90021-7. URL http://www.sciencedirect.com/science/article/pii/0005109875900217.

[18] K. Chang, C. Chong, and Y. Bar-Shalom. Joint probabilistic data association in distributed sensor

networks. In 1985 American Control Conference, pages 817–822, June 1985. doi: 10.23919/ACC.

1985.4788729.

[19] D. Reid. An algorithm for tracking multiple targets. IEEE Transactions on Automatic Control, 24(6):

843–854, December 1979. ISSN 0018-9286. doi: 10.1109/TAC.1979.1102177.

[20] R. Streit and T. E. Luginbuhl. Maximum likelihood method for probabilistic multihypothesis tracking.

Proceedings of SPIE - The International Society for Optical Engineering, 2235, 07 1994. doi:

10.1117/12.179066.

[21] I. J. Cox and S. L. Hingorani. An efficient implementation of reid’s multiple hypothesis tracking al-

gorithm and its evaluation for the purpose of visual tracking. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 18(2):138–150, Feb 1996. ISSN 0162-8828. doi: 10.1109/34.481539.

[22] M. E. Liggins, Chee-Yee Chong, I. Kadar, M. G. Alford, V. Vannicola, and S. Thomopoulos. Dis-

tributed fusion architectures and algorithms for target tracking. Proceedings of the IEEE, 85(1):

95–107, Jan 1997. ISSN 0018-9219. doi: 10.1109/JPROC.1997.554211.

[23] R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering

(ASME), 82D:35–45, 01 1960. doi: 10.1115/1.3662552.

[24] R. Luo and M. Kay. Data fusion and sensor integration: State-of-the-art 1990s. Data fusion in

robotics and machine intelligence, pages 7–135, 1992.

78

http://www.sciencedirect.com/science/article/pii/0005109875900217

[25] G. Bishop, G. Welch, et al. An introduction to the kalman filter. Proc of SIGGRAPH, Course, 8

(27599-23175):41, 2001.

[26] G. Fasano, D. Accardo, A. E. Tirri, A. Moccia, and E. D. Lellis. Radar/electro-optical data fu-

sion for non-cooperative uas sense and avoid. Aerospace Science and Technology, 46:436

– 450, 2015. ISSN 1270-9638. doi: https://doi.org/10.1016/j.ast.2015.08.010. URL http:

//www.sciencedirect.com/science/article/pii/S1270963815002540.

[27] E. A. Wan and R. Van Der Merwe. The unscented kalman filter for nonlinear estimation. In Pro-

ceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control

Symposium (Cat. No.00EX373), pages 153–158, Oct 2000. doi: 10.1109/ASSPCC.2000.882463.

[28] R. Labbe. Kalman and Bayesian Filters in Python. 2015.

[29] S. Ganeriwal, R. Kumar, and M. B. Srivastava. Timing-sync protocol for sensor networks. In

Proceedings of the 1st International Conference on Embedded Networked Sensor Systems,

SenSys ’03, pages 138–149, New York, NY, USA, 2003. ACM. ISBN 1-58113-707-9. doi:

10.1145/958491.958508. URL http://doi.acm.org/10.1145/958491.958508.

[30] M. Manzo, T. Roosta, and S. Sastry. Time synchronization attacks in sensor networks. In Pro-

ceedings of the 3rd ACM Workshop on Security of Ad Hoc and Sensor Networks, SASN ’05, pages

107–116, New York, NY, USA, 2005. ACM. ISBN 1-59593-227-5. doi: 10.1145/1102219.1102238.

URL http://doi.acm.org/10.1145/1102219.1102238.

[31] J. K. Uhlmann. Covariance consistency methods for fault-tolerant distributed data fusion. Informa-

tion Fusion, 4(3):201 – 215, 2003. ISSN 1566-2535. doi: https://doi.org/10.1016/S1566-2535(03)

00036-8. URL http://www.sciencedirect.com/science/article/pii/S1566253503000368.

[32] D. Crisan and A. Doucet. A survey of convergence results on particle filtering methods for practi-

tioners. IEEE Transactions on Signal Processing, 50(3):736–746, March 2002. ISSN 1053-587X.

doi: 10.1109/78.984773.

[33] J. Martinez-del-Rincon, C. Orrite-Urunuela, and J. E. Herrero-Jaraba. An efficient particle filter for

color-based tracking in complex scenes. In 2007 IEEE Conference on Advanced Video and Signal

Based Surveillance, pages 176–181, Sep. 2007. doi: 10.1109/AVSS.2007.4425306.

[34] Y. Bar-Shalom. Update with out-of-sequence measurements in tracking: exact solution. IEEE

Transactions on Aerospace and Electronic Systems, 38(3):769–777, July 2002. ISSN 0018-9251.

doi: 10.1109/TAES.2002.1039398.

[35] M. Orton and A. Marrs. A bayesian approach to multi-target tracking and data fusion with out-of-

sequence measurements. In IEE Target Tracking: Algorithms and Applications (Ref. No. 2001/174),

volume Seminar, pages 15/1–15/5 vol.1, Oct 2001. doi: 10.1049/ic:20010241.

[36] M. L. Hernandez, A. D. Marrs, S. Maskell, and M. R. Orton. Tracking and fusion for wireless sensor

networks. In Proceedings of the Fifth International Conference on Information Fusion. FUSION

79

http://www.sciencedirect.com/science/article/pii/S1270963815002540


http://doi.acm.org/10.1145/958491.958508

http://doi.acm.org/10.1145/1102219.1102238


2002. (IEEE Cat.No.02EX5997), volume 2, pages 1023–1029 vol.2, July 2002. doi: 10.1109/ICIF.

2002.1020924.

[37] P. C. Mahalanobis. On the generalized distance in statistics. National Institute of Science of India,

1936.

[38] S. J. Julier, J. K. Uhlmann, and D. Nicholson. A method for dealing with assignment ambiguity. In

Proceedings of the 2004 American Control Conference, volume 5, pages 4102–4107 vol.5, June

2004. doi: 10.23919/ACC.2004.1383951.

[39] C. Coue, T. Fraichard, P. Bessiere, and E. Mazer. Multi-sensor data fusion using bayesian program-

ming: An automotive application. In Intelligent Vehicle Symposium, 2002. IEEE, volume 2, pages

442–447 vol.2, June 2002. doi: 10.1109/IVS.2002.1187989.

[40] M. Liggins II, D. Hall, and J. Llinas. Handbook of multisensor data fusion: theory and practice. CRC

press, 2017.

[41] G. M. Provan. The validity of dempster-shafer belief functions. International Journal of Approximate

Reasoning, 6(3):389 – 399, 1992. ISSN 0888-613X. doi: https://doi.org/10.1016/0888-613X(92)

90032-U. URL http://www.sciencedirect.com/science/article/pii/0888613X9290032U.

[42] A. P. Dempster. A Generalization of Bayesian Inference, pages 73–104. Springer Berlin Heidelberg,

Berlin, Heidelberg, 2008. ISBN 978-3-540-44792-4. doi: 10.1007/978-3-540-44792-4 4. URL https:

//doi.org/10.1007/978-3-540-44792-4_4.

[43] G. Shafer. A mathematical theory of evidence, volume 42. Princeton university press, 1976.

[44] H. Wu and M. Siegel. Sensor fusion using dempster-shafer theory. 03 2003.

[45] Huadong Wu, M. Siegel, and S. Ablay. Sensor fusion using dempster-shafer theory ii: static weight-

ing and kalman filter-like dynamic weighting. In Proceedings of the 20th IEEE Instrumentation

Technology Conference (Cat. No.03CH37412), volume 2, pages 907–912 vol.2, May 2003. doi:

10.1109/IMTC.2003.1207885.

[46] G. Zhao, X. Xiao, J. Yuan, and G. W. Ng. Fusion of 3d-lidar and camera data for scene parsing. Jour-

nal of Visual Communication and Image Representation, 25(1):165 – 183, 2014. ISSN 1047-3203.

doi: https://doi.org/10.1016/j.jvcir.2013.06.008. URL http://www.sciencedirect.com/science/

article/pii/S1047320313001211. Visual Understanding and Applications with RGB-D Cameras.

[47] H. Durrant-Whyte and T. C. Henderson. Multisensor Data Fusion, pages 585–610. 01 2008. doi:

10.1007/978-3-540-30301-5 26.

[48] L. Forlenza, G. Fasano, D. Accardo, and A. Moccia. Flight performance analysis of an image

processing algorithm for integrated sense-and-avoid systems. International Journal of Aerospace

Engineering, 2012, 06 2012. doi: 10.1155/2012/542165.

80

http://www.sciencedirect.com/science/article/pii/0888613X9290032U

https://doi.org/10.1007/978-3-540-44792-4_4

https://doi.org/10.1007/978-3-540-44792-4_4



[49] M. Copeland. What’s the difference between artificial intelligence, machine learning, and deep

learning? Nvidia, https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-

machine-learning-deep-learning-ai/, July 2016.

[50] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,

A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recogni-

tion Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi:

10.1007/s11263-015-0816-y.

[51] S. Saha. A comprehensive guide to convolutional neural networks—the eli5 way. To-

wards Data Science, https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-

neural-networks-the-eli5-way-3bd2b1164a53, December 2018.

[52] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object

detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern

Recognition, pages 580–587, 2014.

[53] R. Girshick. Fast r-cnn. In The IEEE International Conference on Computer Vision (ICCV), Decem-

ber 2015.

[54] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with

region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):

1137–1149, June 2017. ISSN 0162-8828. doi: 10.1109/TPAMI.2016.2577031.

[55] R. Gholami and N. Fakhari. Chapter 27 - support vector machine: Principles, parameters, and

applications. In P. Samui, S. Sekhar, and V. E. Balas, editors, Handbook of Neural Computa-

tion, pages 515 – 535. Academic Press, 2017. ISBN 978-0-12-811318-9. doi: https://doi.org/

10.1016/B978-0-12-811318-9.00027-2. URL http://www.sciencedirect.com/science/article/

pii/B9780128113189000272.

[56] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The

PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-

network.org/challenges/VOC/voc2012/workshop/index.html.

[57] C. Wolf. Position estimation based on a horizon. https://es.mathworks.com/matlabcentral/fileexchange/52955-

position-estimation-based-on-a-horizon.

[58] M. Quigley, B. Gerkey, and W. D. Smart. Programming Robots with ROS: a practical introduction to

the Robot Operating System. ” O’Reilly Media, Inc.”, 2015.

[59] C. Fairchild and T. L. Harman. ROS robotics by example. Packt Publishing Ltd, 2016.

[60] E. Gonzalez. Uav/uas concept. https://grabcad.com/library/uav-uas-concept-1, June 2019.

[61] Opencv. https://opencv.org, July 2019.

[62] L. Rodriguez Salazar, R. Sabatini, A. Gardi, and S. Ramasamy. A novel system for non-cooperative

uav sense-and-avoid. 04 2013.

81

http://www.sciencedirect.com/science/article/pii/B9780128113189000272

http://www.sciencedirect.com/science/article/pii/B9780128113189000272

Appendix A

Sensor models

This appendix aims to show some of the radar, LIDAR and camera sensors considered during the

research and presented as candidates for the project. The most important aspects of each one will be

highlighted.

A.1 Radar models

The first one (A.1) is a radar from Fortem Technologies which main characteristic is that already provides

the track to the user, thus the information is processed on the same radar and it is not needed a module

apart. Also, it counts with a considerable range.

Figure A.1: True View Radar

The second one (A.2) is from Analog Devices and its presented as a kit. Nevertheless, the Demorad

kit is not really intended as a turnkey solution, rather an evaluation platform or starting point for develop-

ing your own radar system, for prototyping radar for the application and for developing radar algorithms.

The Demorad board is quite light to be mounted on the UAV but the board requires a supply and a USB

connection to the processor for radar data transfer and data processing.

83

Figure A.2: Demorad Radar

The next radar (A.3) is already available at CfAR and it has been used in other projects.

Figure A.3: Chemring Radar

The last one (A.4), from Aerotenna, stands out for its 360o FOV in azimuth, nevertheless, its range

is too low which limits its applications.

Figure A.4: 360o Sense and Avoid Radar

84

A.2 LIDAR models

Regarding the LIDAR models, CfAR has the M8 LIDAR (A.5), from Quanergy.

Figure A.5: M8 LIDAR

There are also upgraded models of this sensor which main difference is a longer range, up to 200 m.

The M8 outputs the angle, distance, intensity and synchronized timestamps of each point in the cloud.

Another LIDAR candidate could be the Ultra Puck (A.6), from Velodyne, where each packet contains:

• Time of flight distance measurement

• Calibrated reflectivity measurement

• Rotation angles

• Synchronized time stamps

Figure A.6: Ultra Puck LIDAR

A.3 Camera models

The Flea3 line of USB3 Vision, GigE Vision and FireWire cameras, from FLIR, offers a variety of CMOS

and CCD image sensors in a compact package. The Flea3 leverages a variety of Sony, ON Semi, and

85

e2v sensors ranging from 0,3 MP to 5,0 MP and from 8 FPS to 120 FPS. They have been used in some

SAA system like the one described in [62]. Nevertheless, there have been some improvements and the

latest sensors and most advanced feature sets can be found in the Blackfly S and Oryx families, also

from FLIR. There are a lot of different models with different characteristics within each family.

On the one hand, Blackfly S cameras include both automatic and precise manual control over image

capture and on-camera pre-processing and they are available in GigE, USB3, cased, and board-level

versions. Their megapixels range from 0,4 to 20 and the frames per second from 18 to 522, for instance.

Each model also has different pixel size and resolution, and it can be found in mono or color, although

for the current computer vision application it would be necessary a color camera.

Figure A.7: Blackfly S USB3 camera

On the other hand, Oryx cameras are only available with GigE (Gigabit Ethernet) interface. They

range from 27 to 162 FPS and 5 to 31 MP. As in the previous family, each model also has a different

pixel size and resolution, and it can be found in mono or color. Their main advantages are the IEEE1588

Clock Synchronization for precise timing across multiple devices and color correction matrix, gamma,

saturation and sharpness that reduce host-side processing requirements.

Figure A.8: Oryx 10GigE camera

86

Evaluation of Non-Cooperative Sensors for Sense and Avoid ...

Documents

Transcript of Evaluation of Non-Cooperative Sensors for Sense and Avoid ...