Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

download Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

of 162

Transcript of Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    1/162

    Universidad Politecnica de MadridEscuela Tecnica Superior

    de Ingenieros de Telecomunicacion

    Visual Object Tracking in Challenging

    Situations using a Bayesian Perspective

    Seguimiento visual de objetos en situaciones

    complejas mediante un enfoque bayesiano

    Ph.D. Thesis

    Tesis Doctoral

    Carlos Roberto del Blanco Adan

    Ingeniero de Telecomunicacion

    2010

    http://0_frontmatter/figures/EtsiTeleco_new.eps
  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    2/162

    ii

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    3/162

    Departamento de Senales, Sistemas yRadiocomunicaciones

    Escuela Tecnica Superiorde Ingenieros de Telecomunicacion

    Visual Object Tracking in Challenging

    Situations using a Bayesian Perspective

    Seguimiento visual de objetos en situacionescomplejas mediante un enfoque bayesiano

    Ph.D. Thesis

    Tesis Doctoral

    Autor:

    Carlos Roberto del Blanco AdanIngeniero de Telecomunicacion

    Universidad Politecnica de Madrid

    Director:

    Fernando Jaureguizar NunezDoctor Ingeniero de Telecomunicacion

    Profesor titular del Dpto. de Senales, Sistemas y

    Radiocomunicaciones

    Universidad Politecnica de Madrid

    2010

    iii

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    4/162

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    5/162

    TESIS DOCTORAL

    Visual Object Tracking in Challenging Situations using a Bayesian

    Perspective

    Seguimiento visual de objetos en situaciones complejas mediante un

    enfoque bayesiano

    Autor: Carlos Roberto del Blanco Adan

    Director: Fernando Jaureguizar Nunez

    Tribunal nombrado por el Mgfco. y Excmo. Sr. Rector de la Universidad Politecnica

    de Madrid, el da . . . . de . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . de 2010.

    Presidente D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Vocal D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Vocal D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Vocal D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Secretario D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Realizado el acto de defensa y lectura de la Tesis el da . . . . de . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . d e 2010 en . . . . . . . . . . . . . . . . . . . . . . . . . . .

    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Calificacion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    EL PRESIDENTE LOS VOCALES

    EL SECRETARIO

    v

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    6/162

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    7/162

    A Vanessa, a mis padres, a mis hermanos.

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    8/162

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    9/162

    Acknowledgements

    Me gustara agradecer a un gran numero de personas el que hayan compar-

    tido mi andadura por el camino de la tesis. Empezare por mi mujer Vanessa

    que tanto apoyo me ha dado y que en mas de una ocasion ha tenido que

    aguantar la version mas irascible de mi mismo. Seguire con mis padres y

    hermanos que tantas veces me han preguntado si me quedaba mucho para

    terminar la tesis. Continuare con todos los miembros y visitantes del GTI,

    haciendo una mencion especial a Fernando, Narciso y Luis, los cuales han

    sufrido los estragos de mis artculos escritos en ese ingles tan espanol. Sin

    olvidar el esfuerzo y migranas que a Fernando, mi tutor, le ha costado leer

    esta tesis, gracias a la cual debe odiar al senor Bayes. Sin duda me gustara

    escribir unas lneas de todos mis companeros del GTI con los que tantacafena he compartido y que han hecho tan agradable mi vida laboral. Sin

    embargo, eso supondra escribir, mas que un tomo de tesis, toda la enciclo-

    pedia Salvat. Por ello me veo obligado a hacer algo mas alternativo a la

    par que dudosamente util: una tabla con las edades del GTI, en la que se

    puede ver a todos mis companeros (o al menos esa ha sido mi intencion) a

    lo largo del viaje temporal de mi tesis.

    This work has been partially supported by the Ministerio de Ciencia e In-

    novacion of the Spanish government by means of a Formaci on del Per-

    sonal Investigador fellowship and the projects TIN2004-07860 (Medusa)

    and TEC2007-67764 (SmartVision).

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    10/162

    Era Periodo Especmenes Procedencia

    Proterozoico - Narciso, Fernando, Luis, Espana

    Francisco, Julian, Nacho

    Arquezoico - Marcos N., Juan Carlos, Carlos R. Espana, EEUU

    Marcos A., Usoa, Shagniq

    Paleozoico

    Cambrico Carlos C., Daniel A., Sharko Espana, Macedonia

    Ordovcico Raul, Jon, Angel Espana

    Silurico Irena, Kristina, Binu Macedonia, India

    Devonico Nerea, Pieter Espana, Belgica

    Carbonfero Pablo, Victor, Gian Luca Espana, Italia

    Permico Hui, Xioadan, Yi, Yang China

    Mesozoico

    Triasico Shankar, Ravi, Gogo, Antonio India, Macedonia, Brasil

    Jurasico Filippo, Maykel, Esther Italia, Cuba, Espana

    Cretazico Sasho, Cesar Macedonia, Espana

    Cenozoico

    Paleoceno Daniel B., Claire Espana, Francia

    Eoceno Lihui, Yu, Ivana China, Macedonia

    Oligoceno Toni, Richard, Carlos G. Espana, Peru

    Mioceno Manuel, Massimo, Jesus Espana, Italia

    Plioceno Rafa, Sergio Espana

    Pleistoceno Su, Wenjia, Xiang, Iviza China, Macedonia

    Holoceno Samira, Abel, Pratik, Srimanta Iran, Espana, India

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    11/162

    Abstract

    The increasing availability of powerful computers and high quality video

    cameras has allowed the proliferation of video based systems, which perform

    tasks such as vehicle navigation, traffic monitoring, surveillance, etc. A

    fundamental component in these systems is the visual tracking of objects

    of interest, whose main goal is to estimate the object trajectories in a video

    sequence. For this purpose, two different kinds of information are used:

    detections obtained by the analysis of video streams and prior knowledge

    about the object dynamics. However, this information is usually corrupted

    by the sensor noise, the varying object appearance, illumination changes,

    cluttered backgrounds, object interactions, and the camera ego-motion.

    While there exist reliable algorithms for tracking a single object in con-

    strained scenarios, the object tracking is still a challenge in uncontrolled

    situations involving multiple interacting objects, heavily-cluttered scenar-

    ios, moving cameras, and complex object dynamics. In this dissertation,

    the aim has been to develop efficient tracking solutions for two complex

    tracking situations. The first one consists in tracking a single object in

    heavily-cluttered scenarios with a moving camera. To address this situa-

    tion, an advanced Bayesian framework has been designed that jointly models

    the object and camera dynamics. As a result, it can predict satisfactorily

    the evolution of a tracked object in situations with high uncertainty about

    the object location. In addition, the algorithm is robust to the background

    clutter, avoiding tracking failures due to the presence of similar objects.

    The other tracking situation focuses on the interactions of multiple objects

    with a static camera. To tackle this problem, a novel Bayesian model has

    been developed, which manages complex object interactions by means of

    an advanced object dynamic model that is sensitive to object interactions.

    This is achieved by inferring the occlusion events, which in turn trigger

    different choices of object motion. The tracking algorithm can also handle

    false and missing detections through a probabilistic data association stage.

    Excellent results have been obtained using publicly available databases,

    proving the efficiency of the developed Bayesian tracking models.

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    12/162

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    13/162

    Resumen

    La creciente disponibilidad de potentes ordenadores y camaras de alta cal-

    idad ha permitido la proliferacion de sistemas basados en vdeo para la

    navegacion de vehculos, la monitorizacion del trafico, la vdeo-vigilancia,

    etc. Una parte esencial en estos sistemas es seguimiento de objetos, siendo

    su principal objetivo la estimacion de las trayectorias en secuencias de vdeo.

    Para tal fin, se usan dos tipos de informacion: las detecciones obtenidas del

    analisis del vdeo y el conocimiento a priori de la dinamica de los objetos.

    Sin embargo, esta informacion suele estar distorsionada por el ruido del sen-

    sor, la variacion en la apariencia de los objetos, los cambios de iluminacion,

    escenas muy estructuradas y el movimiento de la camara.

    Mientras existen algoritmos fiables para el seguimiento de un unico objeto

    en escenarios controlados, el seguimiento es todava un reto en situaciones

    no restringidas caracterizadas por multiples objetos interactivos, escenarios

    muy estructurados y camaras en movimiento. En esta tesis, el objetivo ha

    sido el desarrollo de algoritmos de seguimientos eficientes para dos situa-

    ciones especialmente complicadas. La primera consiste en seguir un unico

    objeto en escenas muy estructuradas con una camara en movimiento. Para

    tratar esta situacion, se ha disenado un sofisticado marco bayesiano que

    modela conjuntamente la dinamica de la camara y el objeto. Esto per-

    mite predecir satisfactoriamente la evolucion de la posicion de los objetos

    en situaciones de gran incertidumbre. Ademas, el algoritmo es robusto a

    fondos estructurados, evitando errores por la presencia de objetos similares.

    La otra situacion considerada se ha centrado en las interacciones de objetos

    con una camara estatica. Para tal fin, se ha desarrollado un novedoso mod-

    elo bayesiano que gestiona las interacciones mediante un avanzado modelo

    dinamico. Este se basa en la inferencia de oclusiones entre objetos, las cuales

    a su vez dan lugar a diferentes tipos de movimiento de objeto. El algoritmo

    es tambien capaz de gestionar detecciones perdidas y falsas detecciones a

    traves de una etapa de asociacion de datos probabilstica.

    Se han obtenido excelentes resultados en diversas bases de datos, lo que

    prueba la eficiencia de los modelos bayesianos de seguimiento desarrollados.

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    14/162

    xiv

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    15/162

    Contents

    List of Figures xvii

    List of Tables xix

    1 Introduction 1

    2 Bayesian models for object tracking 5

    2.1 Tracking with moving cameras . . . . . . . . . . . . . . . . . . . . . . . 9

    2.2 Tracking of multiple interacting objects . . . . . . . . . . . . . . . . . . 12

    3 Bayesian Tracking with Moving Cameras 17

    3.1 Optimal Bayesian estimation for object tracking . . . . . . . . . . . . . 17

    3.1.1 Particle filter approximation . . . . . . . . . . . . . . . . . . . . . 20

    3.2 Bayesian tracking framework for moving cameras . . . . . . . . . . . . . 21

    3.3 Object tracking in aerial infrared imagery . . . . . . . . . . . . . . . . . 24

    3.3.1 Particle filter approximation . . . . . . . . . . . . . . . . . . . . . 27

    3.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    3.3.2.1 Strong ego-motion situation . . . . . . . . . . . . . . . . 36

    3.3.2.2 High uncertainty ego-motion situation . . . . . . . . . . 39

    3.3.2.3 Global tracking results . . . . . . . . . . . . . . . . . . 43

    3.4 Object tracking in aerial and terrestrial visible imagery . . . . . . . . . 46

    3.4.1 Particle Filter approximation . . . . . . . . . . . . . . . . . . . . 51

    3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    xv

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    16/162

    CONTENTS

    4 Bayesian tracking of multiple interacting objects 65

    4.1 Description of the multiple object tracking problem . . . . . . . . . . . . 65

    4.2 Bayesian tracking model for multiple interacting objects . . . . . . . . . 69

    4.2.1 Transition pdfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    4.2.2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    4.3 Approximate inference based on Rao-Blackwellized particle filtering . . 85

    4.3.1 Kalman filtering of the object state . . . . . . . . . . . . . . . . . 91

    4.3.2 Particle filtering of the data association and object occlusion . . 94

    4.4 Ob ject detections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

    4.5.1 Qualitative results . . . . . . . . . . . . . . . . . . . . . . . . . . 107

    4.5.2 Quantitative results . . . . . . . . . . . . . . . . . . . . . . . . . 109

    4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    5 Conclusions and future work 125

    5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

    5.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

    6 Appendix 131

    6.1 Conditional independence and d-separation . . . . . . . . . . . . . . . . 131

    References 133

    xvi

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    17/162

    List of Figures

    3.1 Graphical model for the Bayesian object tracking. . . . . . . . . . . . . 18

    3.2 Consecutive frames of an aerial infrared sequence . . . . . . . . . . . . . 25

    3.3 Multimodal LoG filter response . . . . . . . . . . . . . . . . . . . . . . . 26

    3.4 Likelihood distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.5 Initial translational transformations . . . . . . . . . . . . . . . . . . . . 30

    3.6 Probability values for the ego-motion hypothesis . . . . . . . . . . . . . 30

    3.7 Metropolis-Hastings sampling of the likelihood distribution . . . . . . . 32

    3.8 Particle approximation of the posterior pdf . . . . . . . . . . . . . . . . 33

    3.9 SIR resampling of the posterior pdf . . . . . . . . . . . . . . . . . . . . . 33

    3.10 Kernel density estimation and state estimation . . . . . . . . . . . . . . 35

    3.11 Object tracking result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3.12 Intermediate results for a situation of strong ego-motion . . . . . . . . . 37

    3.13 Tracking results for the BEH algorithm under strong ego-motion . . . . 38

    3.14 Tracking results for the DEH algorithm under strong ego-motion . . . . 39

    3.15 Tracking results for the NEH algorithm under strong ego-motion . . . . 40

    3.16 Intermediate results for a situation greatly affected by the aperture problem 41

    3.17 Tracking results for the BEH algorithm in a situation greatly affected

    the aperture problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.18 Tracking results for the DEH algorithm in a situation greatly affected

    the aperture problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    3.19 Tracking results for the NEH algorithm in a situation greatly affected

    the aperture problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    3.20 Example of the similarity measurement between image regions . . . . . 50

    3.21 Example of feature correspondence . . . . . . . . . . . . . . . . . . . . . 53

    xvii

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    18/162

    LIST OF FIGURES

    3.22 Representation of the affine transformation hypothesis . . . . . . . . . . 55

    3.23 Samples of the object position . . . . . . . . . . . . . . . . . . . . . . . . 56

    3.24 Samples of ellipsis enclosing the object . . . . . . . . . . . . . . . . . . . 56

    3.25 Weighted sampled representation of the posterior pdf . . . . . . . . . . . 57

    3.26 Tracking results with a camera mounted on a car . . . . . . . . . . . . . 59

    3.27 Tracking results with a camera mounted on a helicopter . . . . . . . . . 61

    4.1 Set of detections yielded by multiple detectors . . . . . . . . . . . . . . . 67

    4.2 Data association between detections and objects . . . . . . . . . . . . . 68

    4.3 Object dynamic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    4.4 Graphical model for multiple object tracking . . . . . . . . . . . . . . . 71

    4.5 Graphical model for the initial time step . . . . . . . . . . . . . . . . . . 74

    4.6 Restrictions imposed to the associations between detections and objects. 79

    4.7 Restrictions imposed to the occlusions among objects. . . . . . . . . . . 80

    4.8 Color histograms of two object categories . . . . . . . . . . . . . . . . . 100

    4.9 Similarity maps of the color histograms . . . . . . . . . . . . . . . . . . 102

    4.10 Computed detections from the red dressed team. . . . . . . . . . . . . . 104

    4.11 Computed detections from the black and white dressed team. . . . . . . 105

    4.12 Tracking results for a simple object cross . . . . . . . . . . . . . . . . . . 110

    4.13 Marginalization of the posterior pdf over one specific object . . . . . . . 111

    4.14 Marginalization of the posterior pdf over one specific object . . . . . . . 112

    4.15 Tracking results for a complex object cross . . . . . . . . . . . . . . . . 113

    4.16 Marginalization of the posterior pdf over one specific object . . . . . . . 114

    4.17 Marginalization of the posterior pdf over one specific object . . . . . . . 115

    4.18 Marginalization of the posterior pdf over one specific object . . . . . . . 116

    4.19 Tracking results for an overtaking action . . . . . . . . . . . . . . . . . . 1174.20 Marginalization of the posterior pdf over one specific object . . . . . . . 118

    4.21 Marginalization of the posterior pdf over one specific object . . . . . . . 119

    4.22 Marginalization of the posterior pdf over one specific object . . . . . . . 120

    6.1 Concepts of d-separation and descendants . . . . . . . . . . . . . . . . . 132

    xviii

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    19/162

    List of Tables

    2.1 Tracking problems related to the data association . . . . . . . . . . . . . 13

    3.1 Quantitative results for object tracking with a moving camera in infrared

    imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    3.2 Quantitative results for object tracking with a moving camera in visible

    imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    4.1 Quantitative results for interacting objects 1/2 . . . . . . . . . . . . . . 122

    4.2 Quantitative results for interacting objects 2/2 . . . . . . . . . . . . . . 123

    xix

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    20/162

    LIST OF TABLES

    xx

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    21/162

    Chapter 1

    Introduction

    The evolution and spreading of the technology nowadays have allowed the prolifera-

    tion of video based systems, which make use of powerful computers and high quality

    video cameras to automatically perform increasingly demanding tasks such as vehicle

    navigation, traffic monitoring, human-computer interaction, motion-based recognition,

    security and surveillance, etc. Visual object tracking is a fundamental part in all of

    the previous tasks, and also in the field of computer vision in general. This fact has

    motivated a great deal of interest in object tracking algorithms. The ultimate goal of

    tracking algorithms is to estimate the object trajectories in a video sequence. For this

    purpose, two different kinds of information are used: the video streams acquired by

    the camera sensor and the prior knowledge about the tracked objects and the envi-

    ronment. The video-stream based information is used to compute object detections in

    each frame, also known as observations or measurements. The detection process uses

    the most distinctive appearance features, such as color, gradient, texture, and shape,

    to minimize the probability of false detections and at the same time to maximize the

    detection probability. However, the object appearance can undergo significant varia-

    tions that cause noisy detections and even missing detections, i.e. tracked objects that

    have not been detected. The appearance variations can be produced by articulated or

    deformable objects, illumination changes due to weather conditions (typical in outdoor

    applications), and variations in the camera point of view. Object interactions, such as

    partial and total occlusions, are another source of noisy and missing detections. On the

    other hand, scene structures similar to the objects of interest can cause false detections,

    and thus obfuscating the tracking process. To alleviate these detection shortcomings,

    1

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    22/162

    1. INTRODUCTION

    the tracking also relies on the available prior information in order to constrain the tra-

    jectory estimation problem. This kind of information is mainly the object dynamics,

    which is used to predict the evolution of the object trajectories. The modeling of the

    object dynamics can be a very difficult task, especially in situations in which objects

    undergo complex interactions. On the other hand, object dynamic information is only

    meaningful for static or quasi-static cameras, since, in the case of moving cameras, a

    global motion is induced in the image, called ego-motion, that corrupts the trajectory

    predictions. As a result, the camera dynamics must be also modeled, which makes

    more complex the tracking and increases the uncertainty in the trajectory estimation.While there exist reliable algorithms for the tracking of a single object in constrained

    scenarios, the object tracking is still a challenge in uncontrolled situations involving

    multiple interacting objects, heavily-cluttered scenarios, moving cameras, objects with

    varying appearance, and complex object dynamics. In this dissertation, the main aim

    has been the development of efficient tracking solutions for two of these complex track-

    ing situations. The first one consists in tracking a single object in heavily-cluttered

    scenarios with a moving camera. For this purpose, an advanced Bayesian framework

    has been designed that jointly models the object and camera dynamics. This allowsto predict satisfactorily the evolution of the tracked object in situations of high uncer-

    tainty, in which several object locations are possible because of the combined dynamics

    of the object and the camera. In addition, the algorithm is robust to the background

    clutter, avoiding tracking failures due to the presence of similar objects to the tracked

    one in the background. The inference of the tracking information in the proposed

    Bayesian model cannot be performed analytically, i.e. there is not a closed form ex-

    pression to directly compute the required tracking information. This situation arises

    from the fact that the dynamic and observation processes involved in the Bayesian

    tracking framework are non-linear and non-Gaussian. In order to deal with this prob-

    lem, a suboptimal inference method has been derived that makes use of the particle

    filtering technique to compute an accurate approximation of the object trajectory.

    The other unrestricted tracking situation focuses on the interactions of multiple

    objects with a static camera. To successfully tackle this problem, a novel recursive

    Bayesian model has been developed to explicitly manage complex object interactions.

    This is accomplished by an advanced object dynamic model that is sensitive to the

    object interactions involving long-term occlusions of two or more objects. For this

    2

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    23/162

    purpose, the proposed Bayesian tracking model uses a random variable to predict the

    occlusion events, which in turn triggers different choices of object motion. The track-

    ing algorithm is also able to handle false and missing detections through a probabilistic

    data association stage, which efficiently computes the correspondence between the un-

    labeled detections and the tracked objects. Regarding the inference of the tracking

    information in the proposed Bayesian model for interacting objects, two major issues

    have been carefully addressed. The first one is the mathematical derivation of the

    posterior distribution of the object tracking information, which has been a challenging

    task due to the complexity of the tracking model. The second issue, closely relatedto the first one, arises from the fact that the derived mathematical expression for the

    posterior distribution has not an analytical form due to the involved complex integrals.

    This situation is caused by the non-linear and non-Gaussian character of the stochastic

    processes involved in the Bayesian tracking model, i.e. the dynamic, observation and

    occlusion processes. Subsequently, the inference has to be accomplished by means of

    suboptimal methods, such as the particle filtering. However, the high dimensionality of

    the tracking problem, proportional to the number of tracked objects and object detec-

    tions, causes that the accuracy of the approximate posterior distribution be very poor.

    To overcome this drawback, a novel suboptimal inference method has been developed

    which combines the particle filtering technique with a variance reduction technique,

    called Rao-Blackwellization. This allows to obtain an accurate approximation of the

    object trajectories in high dimensional state spaces, involving multiples tracked objects.

    The organization of the dissertation is as follows. In Chap. 2, a state of the art

    of the most remarkable object tracking techniques for multiple objects is presented,

    placing special emphasis in Bayesian models, strategies for handling moving cameras,

    and the management of multiple objects. The developed recursive Bayesian model

    for tracking a single object in heavily-cluttered scenarios with a moving camera is

    described in Chap. 3. At the end of the chapter, tracking results of the proposed

    Bayesian framework is presented for two kinds of applications, one involving aerial

    infrared imagery, and another dealing with both terrestrial and aerial visible imagery.

    In Chap. 4, the developed Bayesian tracking solution for multiple interacting objects is

    presented, along with a test bench to evaluate the efficiency of the tracking in object

    interactions. Lastly, conclusions and future lines of research are set out in Chap. 5.

    3

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    24/162

    1. INTRODUCTION

    4

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    25/162

    Chapter 2

    Bayesian models for object

    tracking

    Visual object tracking is a fundamental task in a wide range of military and civilian

    applications, such as surveillance, security and defense, autonomous vehicle navigation,

    robotics, behavior analysis, traffic monitoring and management, human-computer in-

    terface, video retrieval, and many more. The visual tracking can be defined as the

    problem of estimating the trajectories of a set of objects of interest in a video sequence

    as they move around the scene. In a typical tracking application there are one or more

    object detectors that generate a set of noisy measurements or detections in discrete

    time instants. The uncertainty of the detection process arises from the noise of camera

    sensor, changes in the scene illumination, variations in the appearance of the objects,

    non-rigid and/or articulated objects, and loss of information caused by the projection

    of the 3D world onto the 2D image plane. The tracking algorithm must be able to han-

    dle the uncertainty in the detection process to assign consistent labels to the tracked

    objects in each frame of a video sequence. This process can be simplified by imposing

    certain constrains on the motion of the objects. For this purpose, a dynamic model can

    be used to predict the motion of the objects, restricting in this way the spatio-temporal

    evolution of the trajectories. Nonetheless, the dynamic model is only an approximation

    of the underlying object dynamics, which indeed can be very complex. As a result, the

    tracking algorithm have to manage different sources of information (detections and ob-

    ject dynamics), taking into account their own uncertainties, to efficiently estimate the

    object trajectories.

    5

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    26/162

    2. BAYESIAN MODELS FOR OBJECT TRACKING

    Bayesian estimation is the most commonly used framework in visual tracking and

    also in other contexts such as radar and sonar. This framework models in a probabilistic

    way the tracking problem, and all its sources of uncertainty: sensor noise, inaccurate

    dynamic models, environmental clutter, etc. From a Bayesian perspective, the aim is to

    compute the posterior distribution over the object state, which is a vector containing

    all the desired tracking information such as, position, velocity, etc. This posterior

    distribution encodes all the necessary information to efficiently compute an estimation

    of the object state. The computation of the posterior distribution is usually performed

    in a recursive way via two cyclic stages: prediction and update. Thus, the computationis efficiently performed since it is only required the previous estimation of the posterior

    distribution, and the set of detections at the current time step. The prediction stage

    evolves the posterior distribution at the previous time step according to the object

    dynamics, obtaining as a result the predicted posterior distribution at the current time

    step. The update stage makes use of the available detections at the current time step

    to correct the predicted posterior distribution by means of the likelihood model of the

    object detector.

    In single object tracking with static cameras, the main difficulty arises from the factthat realistic models for the object dynamics and detection processes are often non-

    linear and non-Gaussian, which leads to a posterior distribution without a closed-form

    analytic expression. In fact, only in a limited number of cases there exist close-form ex-

    pressions. The most well-known closed-form expression is the Kalman filter (1), which

    is obtained when both the dynamic and likelihood models are linear and Gaussian.

    Grid based approaches (2) overcome the limitations imposed on Kalman filter by re-

    stricting the state space to be discrete and finite. If any of the previous assumptions

    does not hold, the exact computation of posterior distribution is not possible, and it

    becomes necessary to resort to approximate inference methods that computes an ap-

    proximation of the posterior distribution. The extended Kalman filter (1) linearizes

    models with weak non-linearities using the first term in a Taylor expansion, so that

    the Kalman filter expression can be still applied. Nonetheless, the performance of the

    extended Kalman filter rapidly decreases as the non-linearities becomes more severe.

    The unscented Kalman filter (3; 4) has proved to be more efficient in models that

    are moderately non-linear. It recursively propagates a set of selected sigma points to

    maintain the second order statistics of the posterior distribution. Both approximate

    6

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    27/162

    solutions, extended and unscented Kalman filters, assume that the underlying posterior

    distribution is Gaussian. But if this assumption does not hold (e.g. the distribution is

    heavily skewed or multimodal), the accuracy of the estimation can be randomly poor.

    The Gaussian sum filter (5) was one of the first attempts to deal with non-Gaussian

    models, approximating the posterior distribution by a mixture of Gaussians. The main

    limitation of the Gaussian sum filter is that linear approximations are required, as in

    the extended Kalman filter. Another limitation is the combinatorial growth of the

    number of Gaussian components in the mixture over time. An alternative solution

    for non-linear non-Gaussian models that does not need linearization is obtained by

    approximate-Grid based methods (2; 6). These methods approximate the continuous

    state space by a finite and fixed grid, and then they apply numerical integration for

    computing the posterior distribution. The grid must be sufficiently dense to compute

    an accurate approximation of the posterior distribution. However, the computational

    cost increases dramatically with the dimensionality of the state space and becomes

    impractical for dimensions larger than four. An additional disadvantage of grid-based

    methods is that the state space cannot be partitioned unevenly in order to improve

    the resolution in regions of high density probability. All these shortcomings are over-

    come by the particle filtering technique (2; 7; 8), also known as Sequential Monte Carlo

    method (9; 10; 11), condensation algorithm (12; 13), or bootstrap filtering (14). It is

    a numerical integration technique that simulates the posterior distribution by a set of

    weighted samples, known as particles, that are propagating recursively along the time.

    The samples are drawn from a proposal distribution, that is the key component of the

    algorithm, and evaluated by means of the dynamic and likelihood models. The particle

    filter has become very successful in a wide range of tracking applications due to its

    efficiency, flexibility, and easy of implementation. Moreover, its computational cost is

    theoretically independent of the dimension of the state space.

    To sum up, the previous tracking approaches have proved to be efficient and reliable

    solutions for single object tracking provided that:

    they fulfill the assumptions of linearity/non-linearity Gaussianity/non-Gaussinity

    for which they were conceived,

    the cameras are static, i.e. with no motion, and

    7

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    28/162

    2. BAYESIAN MODELS FOR OBJECT TRACKING

    there is always a unique detection for the tracked object, which only occurs in

    constrained scenarios where there is total control about the number and types of

    objects that compose the scene.

    In the rest of situations, the tracking task is still a challenge, which is receiving a great

    deal of research attention because of the wide range of potential applications that can

    be developed. The main contribution of this dissertation is the development of efficient

    and reliable algorithms for the tracking of objects in challenging situations. Specifically,

    the research has been focused on two situations: the single object tracking with moving

    cameras, and the multiple interacting object tracking with static cameras.

    In the first situation, the moving camera induces a global motion in the scene,

    called ego-motion, that corrupts the spatio-tempotal continuity of the video sequence.

    As a consequence, the object dynamic information is not useful anymore, since the

    camera motion is not considered, and the tracking performance is seriously reduced. In

    Sec. 2.1, a thorough review of the main techniques that address the ego-motion problem

    for single object tracking with moving cameras is presented.

    In the other considered situation, the tracking algorithm has to manage several

    interacting objects in an environment with static cameras. The difficulty arises from

    the fact that in each time step there is a set of unlabeled detections generated from

    the detectors. This means that the correspondence between objects and detections

    is not known, and therefore a data association stage is required. This fact violates

    the assumption that there is always a unique detection per object, since potentially

    whatever detection can be associated with an object. In fact, the data association

    can be very complex since the number of possible associations is combinatorial with

    the number of objects and detections. Furthermore, there can be false detections and

    missing detections that increase even more the complexity of the data association.

    The false detections arise from the noise of the camera sensor and the scene clutter

    (similar structures to the tracked object in the background). On the other hand, object

    occlusions and strong variations in the object appearance can cause that one or more

    of the objects are not detected, the so-called missing detections. These phenomena

    can also occur for single object tracking in unconstrained scenarios, in which there is

    a unique object, but there can be none, one or multiple detections. For example, the

    nearest neighbor Kalman filter (15) handles the data association problem by selecting

    8

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    29/162

    2.1 Tracking with moving cameras

    the closest detection to the predicted object trajectory, which is used to update the

    posterior distribution of the object state. Unlike the previous method that only uses a

    unique detection, the probabilistic data association filter (16; 17) updates the posterior

    distribution utilizing all the detections that are close to the predicted object trajectory.

    This is accomplished by averaging the innovation terms of the Kalman filter resulting

    from the set of detections. This approach maintains the Gaussian character of the

    posterior distribution. On the other hand, the data association in single object tracking

    can be considered as a specific case of the data association of multiple object tracking,

    where the number of tracked objects in the scene is just one. There exist a lot ofscientific literature in the field of multiple object tracking, and recently there has been

    a revival of interest due to the recent developments in particle filtering and recursive

    Bayesian models in general. In Sec. 2.2, the main multi-object tracking techniques are

    presented, focusing on the problem of the data association.

    2.1 Tracking with moving cameras

    In video based applications in which the video acquisition system is mounted on a

    moving aerial platform (such as a plane, a helicopter, or an Unmanned Aerial Vehicle),

    a mobile robot, a vehicle, etc., the acquired video sequences undergo a random global

    motion, called ego-motion, that prevents the use of the object dynamic information to

    restrict the object position in the scene. As a consequence, the tracking performance

    can be dramatically reduced. The ego-motion problem has been addressed in different

    manners in the scientific literature. They can be split into two categories: approaches

    based on the assumption of low ego-motion, and those based on the ego-motion esti-

    mation.

    Approaches assuming low ego-motion consider that the motion component due to

    the camera is not very significant in comparison with the object motion. In this context,

    some works assume that the spatio-temporal connectivity of the object is preserved

    along the sequence (18; 19; 20), i.e. the image regions associated with the tracked object

    are spatially overlapped in consecutive frames. Then, the tracking is performed using

    morphological connected operators. In cases where the previous assumption does not

    hold, the most common approach is to search for the object in a bounded area centered

    in the location where it is expected to find the object, according to its dynamics.

    9

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    30/162

    2. BAYESIAN MODELS FOR OBJECT TRACKING

    In (21; 22), an exhaustive search is performed in a fixed-size image region centered in

    the previous object location. In (23), the initial search location is estimated using a

    Kalman filter, and then the search is performed deterministically using the Mean Shift

    algorithm (24). Other authors (25; 26) propose a stochastic search based on particle

    filtering, which is able to manage multiple initial locations for the search. However,

    all these methods lose effectiveness as the displacement induced by the ego-motion

    increases. The reason is the size of the search area must be enlarged to accommodate

    the expected camera ego-motion, which produces that the probability that the tracking

    can be distracted by false candidates increases dramatically.The other category of approaches based on the ego-motion estimation are able

    to deal with strong ego-motion situations, in which the camera motion is at least as

    significant as the object motion, and even more. They aim to compute the camera ego-

    motion between consecutive frames in order to compensate it, and thus recovering the

    spatio-temporal correlation of the video sequence. The camera ego-motion is modeled

    by a geometric transformation, typically an affine or projective one, whose parameters

    are estimated by means of an image registration technique. The existing works differ

    in the specific image registration technique used to compute the parameters of thegeometric transformation. Extensive reviews of image registration techniques can be

    found in (27; 28), where the first one tackles all kind of vision based applications, while

    the second one is focused on aerial imagery. According to them, a possible classification

    of the image registration techniques is: those based on features and those based on

    area (i.e. image regions). Feature based image registration techniques detect and

    match distinctive image features between consecutive frames to estimate a geometric

    transformation, which represents the camera ego-motion model. In (29), an object

    detection and tracking system with a moving airborne platform is described, which

    uses a feature based approach to estimate an affine camera model. In (30), the KLT

    method (31) is used to infer a bilinear camera model in an application that detects

    moving objects from a mobile robot. In the field of FLIR (Forward Looking InfraRed)

    imagery, the works (32; 33; 34) describe a detection and tracking system of aerial

    targets mounted on an airborne platform that uses a robust statistic framework to

    match edge features in order to estimate an affine camera model. This system is able

    to successfully handle situations in which the camera motion estimation is disturbed by

    the presence of independent moving objects, provided that there is a minimum number

    10

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    31/162

    2.1 Tracking with moving cameras

    of detected features belonging to the background. In situations in which the detection

    of distinctive features is particularly complicated, because the acquired images are low

    textured and structured, an area-based image registration technique is used to estimate

    the parameters of the camera model. In (35), a perspective camera model is computed

    by means of an optical flow algorithm to detect moving objects in an application of

    aerial visual surveillance. An optical flow algorithm is also used in (36) to estimate the

    parameters of a pseudo perspective camera model, which is utilized to create panoramic

    image mosaics. The same approach is followed in (37; 38) for a tracking application ofterrestrial targets in airborne FLIR imagery. In (39; 40), a target detection framework is

    presented for FLIR imagery that minimizes a SSD (Sum of Squares Differences) based

    error function to estimate an affine camera model. A similar framework of camera

    motion compensation is used in (41) for tracking vehicles in aerial infrared imagery,

    but utilizing a different minimization algorithm. In (42), the Inverse Compositional

    Algorithm is used to obtain the parameters of an affine camera model for a tracking

    application of vehicles in aerial imagery. Unlike the feature based image registration

    techniques, the area based techniques are not robust to the presence of independent

    moving objects, which can drift the ego-motion estimation. In addition, they require

    that the involved images are closely aligned to achieve satisfactory results.

    All the previous approaches, independently of the used camera ego-motion compen-

    sation technique, have in common that they compute at most one parametric model to

    represent the ego-motion between consecutive frames. However, in real applications,

    the ego-motion computation can be quite challenging, because there can be several

    feasible solutions, i.e. several camera geometric transformations, and not necessarily

    the solution with less error is the correct one. This situation arises as a consequence

    of several phenomena, such us the aperture problem (43) (related to low structured

    or textured scenes), the presence of independent moving objects, changes in the scene,

    and limitations of the own camera ego-motion technique. In Chap. 3, an efficient and

    reliable Bayesian framework is proposed to deal with the uncertainty in the estimation

    of the camera ego-motion for tracking applications.

    11

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    32/162

    2. BAYESIAN MODELS FOR OBJECT TRACKING

    2.2 Tracking of multiple interacting objects

    Multiple object tracking can be sought as the generalization of single object tracking,

    in the sense that the main goal is to recover the trajectories of multiple objects from

    a video sequence, rather than only one trajectory from an unique object. However,

    techniques of multiple object tracking are fundamentally different from those of sin-

    gle object tracking, due to the particular problems that arise in the presence of two

    or more objects. In multiple object tracking, the object detections are unlabeled and

    unordered, i.e. the true correspondence between objects and detections is unknown.

    The estimation of the true correspondence, called data association, suffers from the

    combinatorial explosion of the possible associations, in which the computational cost

    inevitably grows exponentially with the number of objects. On the other hand, data

    association is a stochastic process in which the estimation of the true detection as-

    sociation can be extremely difficult due to the involved uncertainty. Furthermore, in

    real situations there can be none, one, or several detections per object. As a result,

    there can be false detections and missing detections, in spite of the fact that the goal

    of the detector is both to minimize the probability of false alarms and to maximize the

    detection probability. This fact increases the complexity of the data association prob-

    lem. The false detections arise from scene structures similar to the objects of interest,

    which can obfuscate the tracking process. The missing detections can be originated

    from changes in the object appearance, which in turn are caused by articulated or de-

    formable objects, illumination changes due to weather conditions (typical in outdoor

    applications), and variations in the camera point of view. Another source of missing

    detections are the partial and total occlusions involved in the object interactions. All

    of these phenomena are also responsible of the noisy character of the detection process.

    Tab. 2.1 summarizes the mentioned sources of disturbances along with their effects,

    and the derived data association problems.

    A great deal of strategies have been proposed in the scientific literature to solve

    the data association problem. These can be divided into single-scan and multiple-scan

    approaches. Single-scan approaches perform the data association considering only the

    set of available detections in a specific time step, while the multiple-scan approaches

    make use of the detections acquired in a temporal interval, comprising several time

    steps. Multiple-scan approaches consider that tracks are basically a sequence of noisy

    12

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    33/162

    2.2 Tracking of multiple interacting objects

    Disturbance Effect Data association Problem

    Changes in the camera Variations in the Missing detections,

    point of view object appearance noisy detections

    Articulated or Variations in the Missing detections,

    deformable objects object appearance noisy detections

    Illumination changes Variations in the Missing detections,

    object appearance noisy detections

    Ob ject interactions Partial or Missing detections,

    total occlusions noisy detections

    Scene structures similar Presence of clutter False detectionsto the objects of interest

    Table 2.1: Disturbances in the detection process, their effects, and the resulting problems

    in data association.

    detections. Thus, the multiple object tracking consists in seeking the optimal paths

    in a trellis formed by the temporal sequence of detections. In this way, the data as-

    sociation problem is cast to one of association of sequence of detections. Techniques

    that accomplish this task are the Viterbi algorithm (44; 45; 46), multiple scan assign-

    ment (47; 48), network theoretic algorithms (49), and the expectation-maximization

    algorithm (EM) (50). The precedent approaches compute a single solution that is

    considered the best one, discarding a lot of feasible hypotheses that could be the true

    solution. To alleviate this situation, some approaches (51; 52) compute the best N solu-

    tions in order to minimize the risk of an incorrect trajectory estimation. An additional

    problem is the computational cost. It is known that the multiple-scan approaches are

    NP-hard problems in combinatorial optimization, i.e. their complexity is exponential

    with the number of objects and detections. The most popular solution to tackle this

    problem is the Lagrangian relaxation (53; 54), wherein the N dimensional assignment

    problem is divided into a set of assignment problems of lower dimensionality. Another

    approach (55) transforms the integer programming problem, posed by the multiple-

    scan assignment, into a linear programming problem by relaxing the constraints for

    an integer solution. This allows to efficiently solve the problem in polynomial time

    through well-known algorithms, such as the interior point method (56).

    Inside the group of single-scan approaches, the simplest one is the global nearest

    13

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    34/162

    2. BAYESIAN MODELS FOR OBJECT TRACKING

    neighbor algorithm (57), also known as the 2D assignment algorithm, which computes

    a single association between detections and objects by minimizing a distance based

    cost function. The main problem of this approach is that many feasible associations

    are discarded. On the other hand, the multiple hypotheses tracker (MHT) (58; 59)

    attempts to keep track of all the possible associations along the time. As it occurs

    with the multiple-scan approaches, the complexity of the problem is NP-hard because

    the number of association grows exponentially over time, and also with the number

    of objects and detections. Therefore, additional methods are required to establish a

    trade-off between the computational complexity and the handling of multiple associa-tion hypotheses. In this respect, one of the most popular methods is the joint prob-

    abilistic data association filter (JPDAF) (60; 61), which performs a soft association

    between detections and objects. This is carried out by combining all the detections

    with all the objects, in such a way that the contribution of each detection to each

    object depends on the statistical distance between them. This method prunes away

    many unfeasible hypotheses, but also restricts the data association distribution to be

    Gaussian, which limits the applicability of the technique. Subsequent works (62; 63)

    try to overcome this limitation by modeling the data association distribution by a mix-ture of Gaussians. However, heuristics techniques are necessary to reduce the number

    of components to make the algorithm computationally manageable. The probabilistic

    multiple hypotheses tracker (PMHT) (64; 65) is another alternative to estimate the best

    data associations hypotheses at a moderate computational cost. It assumes that the

    data association is an independent process to work around the problems with pruning.

    Nevertheless, the performance is similar to that of the JPDAF, although the compu-

    tational cost is higher. The data association problem has been also addressed with

    particle filtering, which allows to deal with arbitrary data association distributions in a

    natural way. Theoretically, the algorithms based in particle filtering have the ability to

    manage the best data association hypotheses with a computational cost independently

    of the number of objects and detections. The computed association hypotheses consti-

    tute an approximation of the true data association distribution, and the approximation

    is more accurate as the number of hypotheses increases. In practice, the performance

    of the particle filtering techniques depends on the ability to correctly sample associa-

    tion hypotheses from a proposal distribution called importance density. In (66; 67), a

    Gibbs sampler is used to sample the data association hypotheses. In a similar way, a

    14

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    35/162

    2.2 Tracking of multiple interacting objects

    Markov Chain Monte Carlo (MCMC) (68; 69; 70) scheme has been used for drawing

    samples that simulate the underlying data association distribution. The main problem

    with these samplers is that they are iterative methods that need an unknown number

    of iterations to converge. This fact makes them inappropriate for online applications.

    Some works (71; 72) overcome this limitation by means of the design of an efficient

    and non-iterative proposal distribution that depends on the specific characteristic of

    the underlying dynamic and likelihood processes of the tracking system. The accuracy

    of the estimation achieved by techniques based on particle filtering depends on the size

    of the dimension of the state space. For high dimensional spaces, the accuracy canbe quite low. In order to deal with this drawback, a technique of variance reduction,

    called Rao-Blackwellization, has been used in (73), which improves the accuracy of

    the estimated object trajectories for a given number of samples or hypotheses. An

    alternative to the particle filtering is the probability hypothesis density (PHD) filter

    that can also address missing and false detections like the particle filtering. However,

    the computational cost is exponential with the number of objects. In order to reduce

    the complexity from exponential to linear, the full posterior distribution is simplified

    by its first-order moment in (74). Nonetheless, this approach is only satisfactory formultivariate distributions that can be reasonable approximated by its first moment,

    which can be an excessive limitation for some tracking applications.

    The previous works have been designed to track multiple objects with restricted

    kinds of interactions among them. For instance, these works are able to handle object

    interactions involving trajectory changes but without occlusions, such as a situation

    with two people who stop one in front the other. In this case the object detections are

    used to efficiently correct the object trajectories. Another kind of interaction that is

    successfully addressed involves object occlusions but without trajectory changes, such

    as a situation with two people who cross each other maintaining their paths. In this

    case, the data association stage can manage the missing detections during the occlusion,

    relying on their trajectories are unchanged in order to predict their tracks. However, in

    complex object interactions involving trajectory changes and occlusions, the previous

    approaches are prone to fail because the occluded objects have not available detections

    to correct their trajectories. This limitation arises from the fact the main tracking

    techniques for multiple objects have been developed for radar and sonar applications,

    in which the dynamics of the tracked objects have physical restrictions that make

    15

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    36/162

    2. BAYESIAN MODELS FOR OBJECT TRACKING

    impossible the complex interactions that arise in visual tracking. Moreover, in the field

    of radar and sonar, the objects are handled as point targets that cannot be occluded.

    Some works have proposed strategies to deal with the specific problems that arise in

    the field of visual tracking. In (75; 76), the data association hypotheses are drawn

    using a sampling technique that is able to handle split object detections, i.e. group

    of detections that have been generated from the same object. The split detections

    are typical from background subtraction techniques (77), which are used to detect

    moving objects in video sequences. In (78), a specific approach for handling object

    interactions that involve occlusions and changes in trajectories is presented. It createsvirtual detections of possible occluded objects to cope with the changes in trajectories

    during the occlusions. However, since the occlusion events are not explicitly modeled,

    tracking errors can appear when a virtual detection is associated to an object that is

    actually not occluded. In order to improve the performance of the tracking of multiple

    objects in the field of computer vision, a novel Bayesian approach that explicitly models

    the occlusion phenomenon has been developed. This approach is able to track complex

    interacting objects whose trajectories change during the occlusions. Chap. 4 describes

    in detail the proposed visual tracking for multiple interacting models.

    16

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    37/162

    Chapter 3

    Bayesian Tracking with Moving

    Cameras

    This chapter starts with a brief overview of the optimal Bayesian framework for gen-

    eral object tracking (Sec. 3.1), explaining also the basics of the particle filtering, an

    approximate inference technique. Next, the developed Bayesian tracking framework for

    moving cameras is presented in Sec. 3.2, which models the camera motion in a prob-

    abilistic way. Lastly, Secs. 3.3 and. 3.4 show respectively how to apply the proposed

    Bayesian model to two visual tracking applications for moving cameras: the first one

    focused on aerial infrared imagery, and the second one for aerial and terrestrial visible

    imagery.

    3.1 Optimal Bayesian estimation for object tracking

    The Bayesian approach for object tracking aims to estimate a state vector xt that

    evolves over time using a sequence of noisy observations z1:t = {zi|i = 1,...,t} up

    to time t. The state vector contains all the relevant information for the tracking at

    time step k, such as the object position, velocity, size, appearance, etc. The noisy

    observations z1:t (also called measurements or detections) are obtained by one or more

    detectors, which analyze the video sequence information acquired by the camera to

    either directly compute the object position, or indirectly obtain relevant features that

    can related to the object position, such as motion, color, texture, edges, corners, etc.

    From a Bayesian perspective, some degree of belief in the state xt at time t is

    17

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    38/162

    3. BAYESIAN TRACKING WITH MOVING CAMERAS

    calculated, using the available prior information (about the object, the camera and

    the scene), and the set of observations z1:t. Therefore, the tracking problem can be

    formulated as the estimation of the posterior probability density function (pdf) of the

    state of the object, p(xt|z1:t), conditioned to the set of observations, where the initial

    pdf p(x0|z0) p(x0) is assumed to be known. This probabilistic model for the object

    tracking can be graphically represented by a graph (see Fig. 3.1), called graphical

    model, in which the random variables are represented by nodes, and the probabilistic

    relationships among the variables by arrows.

    Figure 3.1: Graphical model for the Bayesian object tracking.

    For efficiency purposes, the estimation of the posterior pdf p(xt|z1:t) is recursively

    performed through two stages: the prediction of the most probable state vectors us-

    ing the prior information, and the update (or correction) of the prediction based on

    the observations. The prediction stage involves computing the prior pdf of the state,

    p(xt|z1:t1), at time t via the Chapman-Kolmogorov equation

    p(xt|z1:t1) = p(xt, xt1|z1:t1)dxt1 = p(xt|xt1)p(xt1|z1:t1)dxt1, (3.1)where p(xt1|z1:t1) is the posterior pdf at the previous time step, and p(xt|xt1) is

    the state transition probability, that encodes the prior information, for example the

    object dynamics along with its uncertainty. The state transition probability is defined

    by a possibly non-linear function of the state xt1, and an independent identically

    distributed noise process vt1

    xt = ft(xt1, vt1). (3.2)

    18

    http://./3/figures/eps/chap_3_sec_1_sub_1_fig_1.eps
  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    39/162

    3.1 Optimal Bayesian estimation for object tracking

    The update stage aims to reduce the uncertainty of the prediction, p(xt|z1:t1),

    using the new available observation zt (observations are available at discrete times)

    through the Bayes rule

    p(xt|z1:t) =p(zt|xt)p(xt|z1:t1)

    p(zt|z1:t1), (3.3)

    where p(zt|xt) is the likelihood distribution that models the observation process, i.e. it

    assesses the degree of support of the observation zt by the prediction xt. The likelihood

    is given by a possibly nonlinear function of the state xt, and an independent identically

    distributed noise process nt

    zt = ht(xt, nt). (3.4)

    The denominator of Eq. (3.3) is simply a normalization constant given by

    p(zt|z1:t1) = p(zt, xt|xt)dxt = p(zt|xt)p(xt|z1:t1)dxt. (3.5)The posterior p(xt|z1:t) embodies all the available statistical information, allowing

    the computation of an optimal estimate of the state vector xt, that contains the de-sired tracking information. Commonly used estimators are the Maximum A Posteriori

    (MAP) and the Minimum Mean Square Error (MMSE), given respectively by

    M AP :

    xt = arg max

    xt

    p(xt|z1:t) (3.6)

    MMSE: xt = E(p(xt|z1:t)) (3.7)Nevertheless, the optimal solution of the posterior probability, given by Eq. 3.3,

    can not be determined analytically in practice, due to the nonlinearities and non-

    Gaussianities of the prior information and observation models. Therefore, it is necessary

    the use of suboptimal methods to obtain an approximate solution. In the Sec. 3.1.1, a

    powerful and popular suboptimal method, call Particle Filtering, will be described.

    19

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    40/162

    3. BAYESIAN TRACKING WITH MOVING CAMERAS

    3.1.1 Particle filter approximation

    The Particle Filter is an approximate inference method based on Monte Carlo simula-

    tion for solving Bayesian filters. In contrast to other approximate inference methods,

    such as Extended Kalman Filters, Unscented Kalman Filters and Hidden Markov Mod-

    els, Particle Filtering is able to deal with continuous state spaces and nonlinear/non-

    Gaussian processes (9), which arise in a natural way in real tracking situations. The

    Particle Filtering technique approximates the posterior probability p(xt|z1:t) by a set

    of NS weighted random samples (or particles) {xit, i = 0,...,NS} (2)

    p(xt|z1:t) 1

    c

    NSi=1

    wit(xt xit), (3.8)

    where the function (x) is the Kroneckers delta, {wit, i = 0,...,NS} is the set of weights

    related to the samples, and c =NSi=1 wit is a normalization factor. As the number

    of samples becomes very large, this approximation becomes equivalent to the true

    posterior pdf.

    Samples x

    i

    t and weights w

    i

    t are obtained using the concept of importance sam-pling (2; 79), which aims to reduce the variance of the approximation given by Eq. (3.8)

    through Monte Carlo simulation. The set of samples {xit, i = 0,...,NS} is drawn from

    a proposal distribution function q(xt|xt1, zt), called the importance density. The op-

    timal q(xt|xt1, zt) should be proportional to p(xt|z1:t), and should have the same

    support (the support of a function is the set of points where the function is not zero),

    in whose case the variance would be zero. But this is only a theoretical solution, since

    it would imply that p(xt|z1:t) is known. In practice, it is chosen a proposal distribution

    as similar as possible to the posterior pdf, but there is not a standard solution, sinceit depends on the specific characteristics of the tracking application. The choice of

    the proposal distribution is a key component in the design of Particle Filters, since

    the quality of the estimation of the posterior pdf depends on the ability to find an

    appropriate proposal distribution.

    The weights wit related to each sample xit are recursively computed by (2)

    wit = wit1

    p(zt|xit)p(xit|xit1)

    q(xit|xit1, zt)

    . (3.9)

    20

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    41/162

    3.2 Bayesian tracking framework for moving cameras

    The importance sampling principle has a serious drawback, called the degeneracy

    problem (2), consisting in all the weights except one have an insignificant value after

    a few iterations. In order to overcome this problem, several resampling techniques

    have been proposed in the scientific literature, which introduce an additional sampling

    step that consists in populating more times those samples that are more probable. A

    popular resampling strategy is the Sampling Importance Resampling (SIR) algorithm,

    which makes a random selection of the samples at each time step according to their

    weights. Thus, the samples with higher weights are selected more times, while the ones

    with an insignificant weight are discarded. After SIR resampling, all the samples havethe same weight.

    3.2 Bayesian tracking framework for moving cameras

    In video sequences acquired by a moving camera, the perceived motion of the objects is

    composed by the own object motion and the camera motion. Consequently, it is neces-

    sary to estimate the camera motion in order to obtain the object position. According

    to this, the state vector xt = {dt, gt} must contain not only the object dynamics, dt,

    (position and velocity over the image plane), but also the camera dynamics, gt, i.e. the

    camera ego-motion. The posterior pdf of the state vector is recursively expressed by

    the equations

    p(xt|z1:t) =p(zt|xt)p(xt|z1:t1)

    p(zt|z1:t1)(3.10)

    p(xt|z1:t1) =

    p(xt|xt1)p(xt1|z1:t1)dxt1. (3.11)

    The transition probability p(xt|xt1) = p(dt, gt|dt1, gt1) encodes the information

    about the object and camera dynamics, along with their uncertainty. If the camera

    motion is not considered, the object dynamics can be modeled by the linear function

    d

    t = M d

    t1, (3.12)

    where M is a matrix that represents a first order linear system of constant velocity.

    This object dynamic model is a reasonable approximation for a wide range of object

    tracking applications, provided that the camera frame rate is enough high. The camera

    21

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    42/162

    3. BAYESIAN TRACKING WITH MOVING CAMERAS

    dynamics is modeled by a geometric transformation gt that ideally is a projective cam-

    era model, although, depending on the camera and scene disposition, it can be simplify

    to an affine or Euclidean transformation. For example, in aerial tracking systems, an

    affine geometric transformation is a satisfactory approximation of the projective cam-

    era model, since the depth relief of the objects in the scene is small enough compared

    to the average depth, and the field of view is also small (80). The joint dynamic model

    for the camera and the object is expressed as a composition of both individual models

    dt = gt M dt1. (3.13)

    Based on this joint dynamic model, the transition probability p(xt|xt1) can be ex-

    pressed as

    p(xt|xt1) = p(dt, gt|dt1, gt1) = p(dt|dt1, gt1:t)p(gt|dt1, gt1)

    = p(dt|dt1, gt)p(gt), (3.14)

    where it has been assumed that, on the one hand, the current object position is condi-

    tionally independent of the camera motion in the previous time step (as the proposed

    joint dynamic model states), and, on the other hand, the current camera motion is

    conditionally independent of both the camera motion and the object position in pre-

    vious time steps. This last assumption results from the fact that the camera ego-

    motion is completely random, not following any specific pattern. The probability term

    p(dt|dt1, gt) models the uncertainty of the proposed joint dynamic model as

    p(dt|dt1, gt) = Ndt; gt M dt1,

    2tr

    , (3.15)

    where N(x; , 2) is a Gaussian or Normal distribution of mean and variance 2.

    Thus, the term 2tr represents the unknown disturbances of the joint dynamic model.

    The other probability term in Eq. 3.14, p(gt), expresses the probability that one

    specific geometric transformation represents the true camera motion between consecu-

    tive time steps. This is typically computed by a deterministic approach using an image

    registration algorithm (27), which amounts to express p(gt) as

    p(gt) = (gt gjt ), (3.16)

    22

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    43/162

    3.2 Bayesian tracking framework for moving cameras

    where gjt is the geometric transformation obtained by the image registration technique.

    However, this approximation can fail in situations in which the aperture problem (43;

    81) is quite significant and/or the assumption of only one global motion does not hold,

    for instance, in the presence of independent moving objects. Under these circumstances

    there are several putative geometric transformations that can explain the camera ego-

    motion. Moreover, the best geometric transformation according to some error or cost

    function can not necessarily be the actual camera ego-motion, due to the noise and

    non-linearities involved in the estimation process. In order to satisfactorily deal with

    this situation, gt is addressed as a random variable, rather than a parameter computed

    in a deterministic way. The specific computation of p(gt) depends of the tracking

    application and type of imagery. Two different methods are proposed in Secs. 3.3 and

    3.4 for infrared and visible imagery, respectively. In any case, they have in common

    that they compute an approximation of p(gt) as

    p(gt)

    Ng

    j=1 wjt (gt g

    jt ), (3.17)

    where Ng is the number of geometric transformations used to represent p(gt), {gjt |j =

    1,...,Ng} are the best transformation candidates to model the camera ego-motion, and

    wjt is the weight ofgjt , that evaluates how well the transformation represents the camera

    ego-motion.

    The likelihood function p(zt|xt) in Eq. 3.10 is dependent on the kind of imagery, and

    the object type that is being tracked. Two different models have been developed, one

    based on the detection of blob regions for infrared imagery, and another based on colorhistograms for visible video sequences, which are described respectively in Secs. 3.3

    and 3.4. In general terms, the resulting likelihood will be non-Gaussian, nonlinear and

    multi-modal, due to the presence of clutter and objects similar to the tracked object.

    The initial pdf p(x0|z0) p(x0), called the prior, can be initialized as a Gaussian

    distribution using the information given by an object detector algorithm, as in (18;

    19; 32; 33; 34; 39; 40). Another alternative is to use the ground truth information (if

    it is available) to initialize a Kroneckers delta function (x0).

    23

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    44/162

    3. BAYESIAN TRACKING WITH MOVING CAMERAS

    3.3 Object tracking in aerial infrared imagery

    This section presents the developed object tracking approach for aerial infrared imagery.

    In contrast to visual-range images, infrared images have low signal-to-noise ratios, ob-

    jects low contrasted with the background, and non-repeatable object signatures. These

    drawbacks, along with the competing background clutter, and illumination changes

    due to weather conditions, make the tracking task extremely difficult. On the other

    hand, the unpredictable camera ego-motion, resulting from the fact that the camera

    is on board of an aerial platform, distorts the spatio-temporal correlation of the video

    sequence, negatively affecting the tracking performance.

    All the aforementioned problems are addressed by a tracking strategy based on

    the Bayesian tracking framework for moving cameras proposed in Sec. 3.2. According

    to this, the posterior pdf of the state vector p(xt|z1:t1) is recursively computed by

    Eqs. 3.10 and 3.11.

    The transition probability p(xt|xt1), that encodes the joint camera and object dy-

    namic model, is given by Eq. 3.14, where the prior probability p(gt) of the geometric

    transformation was dependent on the specific type of imagery. For the ongoing tracking

    application dealing with infrared imagery, the p(gjt ) of a specific geometric transfor-

    mation gjt is based on the quality of the image alignment between consecutive frames

    achieved by gjt . The quality of the image alignment (or the ego-motion compensation)

    is computed by means of the Mean Square Error function, mse(x, y), between the cur-

    rent frame It, and the previous frame It1 warped by the transformation gjt . Thus, the

    probability p(gjt ) is mathematically expressed as

    p(gjt ) = Nmse

    It, gjt It1

    ; 0, 2g(3.18)

    where N(x; , 2) is a Gaussian distribution of mean and variance 2, and 2g is the

    expected variance of the image alignment process. Notice that It is an infrared intensity

    image.

    Finding an observation model for the likelihood p(zt|xt) in airborne infrared imagery

    that appropriately describes the object appearance and its variations along the time,

    is quite challenging due to the aforementioned characteristics of the infrared imagery.

    The most robust and reliable object property is the presence of bright regions, or at

    least, regions that are brighter than their surrounding neighborhood, which typically

    24

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    45/162

    3.3 Object tracking in aerial infrared imagery

    (a) (b)

    Figure 3.2: Two consecutive frames of an infrared sequence acquired by an airborne

    camera.

    correspond to the engine and exhaust pipe area of the object. Based on this fact,

    the likelihood function uses an observation model that aims to detect the main bright

    regions of the target. This is accomplished by a rotationally symmetric Laplacian of

    Gaussian (LoG) filter, characterized by a sigma parameter that is tuned to the lowest

    dimension of the object size, so that the filter response be maximum in the brightregions with a size similar to the tracked object. The main handicap of the observation

    model is its lack of distinctiveness, since whatever bright region with an adequate

    size can be the target object. As a consequence, the resulting LoG filter response is

    strongly multi-modal. This fact, coupled with the camera ego-motion, dramatically

    complicate a reliable estimation of the state vector. This situation is illustrated in

    Figs. 3.2 and 3.3. The first one, Fig. 3.2, shows two consecutive frames,(a) and (b),

    of an infrared sequence acquired by an airborne camera, in which the target object

    has been enclosed by a rectangle. Fig. 3.3 shows the LoG filter response related to

    Fig. 3.2(b), where the own image has been projected over the filter response for a

    better interpretation. The multi-modality feature is clearly observed, and in theory

    any of the modes could be the right object position. Moreover, if only the object

    dynamics is considered, the closest mode to the predicted object location (marked by a

    vertical black line) is not the true object location, because of the effects of the camera

    ego-motion.

    25

    http://./3/figures/eps/chap_3_sec_2_fig_1_2.epshttp://./3/figures/eps/chap_3_sec_2_fig_1_1.eps
  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    46/162

    3. BAYESIAN TRACKING WITH MOVING CAMERAS

    Figure 3.3: Multimodal LoG filter response related to Fig. 3.2(b).

    Figure 3.4: Likelihood distribution related to Fig. 3.3.

    The likelihood probability can be simplified by

    p(zt|xt) = p(zt|dt, gt) = p(zt|dt), (3.19)

    assuming that zt is conditionally independent ofgt given dt. Then, p(zt|dt) is expressed

    by the Gaussian distribution

    p(zt|dt) = N(zt; Hdt, 2L), (3.20)

    where zt is the LoG filter response of the frame It, H is a matrix that selects the object

    positional information, and the variance L is set to highlight the main modes of zt,

    while discarding the less significant ones. This is illustrated in Fig. 3.4, where only the

    most significant modes of Fig. 3.3 are highlighted.

    26

    http://./3/figures/eps/chap_3_sec_2_fig_3.epshttp://./3/figures/eps/chap_3_sec_2_fig_2.eps
  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    47/162

    3.3 Object tracking in aerial infrared imagery

    As both dynamic and observation models, are nonlinear and non-Gaussian, the

    posterior pdf can not be analytically determined, and therefore, the use of approximate

    inference methods is necessary. In the next section, a Particle Filtering strategy is

    presented to obtain an approximate solution of the posterior pdf.

    3.3.1 Particle filter approximation

    The posterior pdf p(xt|z1:t) is approximated by means of a Particle Filter as

    p(xt|z1:t) 1c

    NSi=1

    wit(xt xit), (3.21)

    where the samples xit are drawn from a proposal distribution based on the likelihood

    and the prior probability of the camera motion

    q(xt|xt1, zt) = p(zt|dt)p(gt), (3.22)

    which is an efficient simplification of the optimal, but not tractable, importance density

    function (9)

    q(xt|xt1, zt) = p(xt|xt1, zt). (3.23)

    The samples xit = {dit, git} are drawn from the proposal distribution by a hierar-

    chical sampling strategy. This, firstly, draws samples git from p(gt), and then, draws

    samples dit from p(zt|dt). The sampling procedure for obtaining samples git from p(gt)

    is based on an image registration algorithm presented in (82). This method assumes an

    initial geometric transformation tit, and then, uses the whole image intensity informa-

    tion to compute a global affine transformation git, which is a candidate for representing

    the true camera motion. This method explicitly accounts for global variations in image

    intensities to be robust to illumination changes. However, the computed candidate git

    only will be a reasonable approximation of the camera motion if the initial geometric

    transformation tit is close to the geometric transformation that represents the actual

    camera motion. This means that the image in the previous time step, warped by the

    initial transformation, must be closely aligned to the current image to achieve a satis-

    factory result. This limitation derives from the optimization strategy used in the image

    registration algorithm, that tends to the closest mode given an initial transformation.

    27

  • 8/3/2019 Th_Visual Object Tracking in Challenging Situations Using a Bayesian Perspective

    48/162

    3. BAYESIAN TRACKING WITH MOVING CAMERAS

    As a consequence, if the two images are not closely aligned, the computed solution will

    probably correspond to a local mode, that does not represent the true camera motion.

    By default, tit is a 3 3 identity matrix that represents the previous image without

    warping. This approach is inefficient in airborne visual tracking, since the camera can

    undergo strong displacements that can not be satisfactorily compensated. To overcome

    this problem, the previous image registration technique has been improved using several

    initial geometric transformations {tit|i = 1,...,NS}, obtaining in turn a set of camera

    ego-motion candidates {git|i = 1,...,NS}. The set of initial transformations are com-

    puted with the purpose that at least one of them is relatively close to the actual cameramotion, so that the image registration algorithm can effectively compute the correct

    geometric transformation. In this context, the concept of closeness between geometric

    transformations depends, on the one hand, on the magnitude of the camera motion,

    and, on the other hand, on the own capability of the image registration algorithm to

    rectify misaligned images. For example, the ideal situation would be that the magni-

    tude of the camera motion were lower than the maximum displacement that the image

    registration algorithm is able to rectify. For the purpose of measuring the magnitude

    of the camera motion, a subset of video sequences belonging to the AMCOM dataset(see Sec. 3.3.2) has been used as training set to compute the actual camera motion.

    These sequences have been acquired by different infrared cameras on board a plane.

    The computation process of the camera motion has been supervised by a user, which

    not only guides the image alignment, but also evaluates if the reached result is enough

    accurate to be considered the real camera motion. As a result, a set of affine transfor-

    mations are obtained, which described the typical camera movements. Regarding the

    image registration algorithm,