Biometric Identification from Forensic Video Surveillance ... · indiv´ıduos que neles ﬁgurem....

Biometric Identification from Forensic VideoSurveillance Evidence

Miguel Reis Moitinho de Almeida

Thesis to obtain the Master of Science Degree inEngenharia Electrotecnica e de Computadores

Supervisor: Doctor Paulo Luıs Serras Lobato Correia

Examination Committee

Chairperson: Doctor Jose Eduardo Charters Ribeiro da Cunha SanguinoSupervisor: Doctor Paulo Luıs Serras Lobato Correia

Members of the Committee: Doctor Joao Miguel da Costa MagalhaesCarlos Filipe Bento Gregorio

May 2015

Acknowledgments

Thank you to:

• The European Cooperation in Science and Technology (COST) for providing me with the

opportunity to learn from the best;

• Peter Kastmand Larsen for his good humour, scientific insight and shared experience;

• Laboratorio de Polıcia Cientıfica, mainly to Carlos Gregorio, Francisco Calado, Ana Cristina

Correia and Gisela Rosa for their precious insight;

• Pedro Tomas for publicly providing the template for this dissertation and thus saving me a

lot of time on formatting;

• All OpenCV contributors who contributed to such a great tool;

• J for making dinner and tea, while cheering me up, and of course for putting up with me;

• My father, not for nagging me one too many times, but for helping me review this document

and for putting up with cross compilation and linking questions;

• David Pereira for the painstaking review and morale boosting;

• Least but not last, to Paulo Lobato Correia for not giving up on me, putting up with me and

giving all the tools to successfully conclude this thesis.

Abstract

Considering that the tools available for the forensic crime scene investigators responsible for

video and biometric analysis are so expensive or simply not designed for the job, and since so

much academic work is produced in these areas, the opportunity to join both worlds arose. With

the objective of accomplishing this approach, this work proposes the creation of a unified forensic

video and biometric data extraction platform.

With this in mind, the objective of this project is to design and implement a platform capable of

integrating both video and biometric extraction capabilities. In order to exemplify the capabilities

of this platform, some of the main day to day tools used by the target users were identified and

implemented. These same needs were identified and evaluated with cooperation of the Scientific

Police Lab of Polıcia Judiciaria and the Unit of Forensic Anthropology from Faculty of Health

Sciences of the University of Copenhagen, the latter one thanks to the support of the European

Cooperation in Science and Technology.

The analysis of the implemented tools, the description of the ones currently being imple-

mented, and the targeting of the ones that should be implemented in the near future are pre-

sented. This project intends to create cooperation ties between universities and forensic investi-

gators, not only from the labs that took part on it, but also with others from around the world.

Keywords

Biometry, Video Analysis, Forensic Analysis, Motion Detection, Feature Extraction, BioFoV

iii

Resumo

Considerando que as ferramentas informaticas disponıveis para os investigadores forenses

de video de vigilancia serem dispendiosas ou nao estarem talhadas para a analise forense de

vıdeo, e que tanto trabalho e produzido pela academia nesta area, levantou-se a necessidade de

juntar estas duas vertentes. Com o objectivo de proceder a tal aproximacao, este trabalho propoe

a criacao de uma plataforma de analise forense de vıdeo e extraccao de dados biometricos de

indivıduos que neles figurem.

Para tal, este projecto tem como objectivo projectar e implementar uma plataforma que seja

capaz de integrar tanto a capacidade de processar vıdeos como a de extrair caracterısticas

biometricas dos mesmos. Para exemplificar as capacidades desta plataforma, algumas das prin-

cipais ferramentas necessarias para o dia a dia dos seus utilizadores alvo foram identificadas e

implementadas. Essas mesmas necessidades foram identificadas e avaliadas em conjunto com

o Laboratorio de Polıcia Cientıfica da Polıcia Judiciaria e com a Unidade de Antropologia Forense

da Faculdade de Ciencias da Saude da Universidade de Copenhaga, esta ultima gracas ao apoio

da Cooperacao Europeia em Ciencias e Tecnologia.

E feita a avaliacao das ferramentas ja implementadas, a descricao das que estao a ser imple-

mentadas, e o delineamento das que deverao ser integradas num futuro proximo. Este projecto

pretende fomentar a cooperacao entre universidades e investigadores forense, nao so dos labo-

ratorios que nele participaram, como outros de todo o mundo.

Palavras Chave

Biometria, Analise de Vıdeo, Analise Forense, Deteccao de Movimento, Extraccao de Carac-

terısticas, BioFoV

v

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 State of the art 5

2.1 Video Analysis Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Laboratorio de Polıcia Cientıfica da Polıcia Judiciaria (Portuguese Criminal

Police Laboratory) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2 Unit of Forensic Anthropology – Faculty of Health Sciences of the University

of Copenhagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Evidence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Impossibility of Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.2 Subject Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.3 Likelihood Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Biometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Soft Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1.A Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1.B Ethnicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1.C Race . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1.D Clothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1.E Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1.F Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1.G Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1.H Silhouette Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1.I Tattoos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.2 Hard Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.2.A Fingerprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

vii

Contents

2.3.2.B Palmprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2.C Hand Veins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2.D Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.1 Camera Calibration With Camera Access . . . . . . . . . . . . . . . . . . . 13

2.4.2 Camera Calibration Without Camera Access . . . . . . . . . . . . . . . . . 13

2.4.3 Stereo Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.4 Colour Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.1 Single Camera Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.2 Stereo Camera Pair Photogrammetry . . . . . . . . . . . . . . . . . . . . . 14

2.6 Video Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6.1 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6.2 Deinterlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6.3 Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6.4 Superresolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.7 Evidence Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.8 Software Bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.8.1 Calibration and Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . 19

2.8.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.8.3 Forevid (Video Processing and Documentation) . . . . . . . . . . . . . . . . 19

2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Proposed Video Analysis System 21

3.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1 Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2 Biometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.2 Noisy Area Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.3 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Biometric Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.1 Tattoos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.2 Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3.3 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

viii

Contents

4 Implementation 29

4.1 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Modularity of the Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Data Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.1 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.2 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.3 Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.4 Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.5 Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.6 Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.7 Individual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.4 User Interface Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.4.1 Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.4.2 Drawable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5 GUI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.6 Height Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.7 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36


4.9 Other Useful Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.9.1 Export Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.9.2 Print Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.10 Computational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.11 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.11.1 Technical Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.11.2 User Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.12 Open Sourcing the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.13 Build System and Continuous Integration . . . . . . . . . . . . . . . . . . . . . . . 40

4.13.1 Make Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.13.2 Continuous Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.13.3 Cross Compilation Support . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.13.4 Static Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 Results 41

5.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.1 Long Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2.2 Short Video for Height Measurement . . . . . . . . . . . . . . . . . . . . . . 42

5.2.3 Videos for Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

ix

Contents

5.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.5 Height Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.6 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.7 GUI Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Conclusions 49

6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.1.1 Implementation of New Features . . . . . . . . . . . . . . . . . . . . . . . . 50

6.1.2 Improvements on Third Party Libraries . . . . . . . . . . . . . . . . . . . . . 51

6.1.3 Support More Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1.4 Integration Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

x

List of Figures

2.1 Silhouette superimposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Macbeth ColorChecker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Single camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Stereo camera pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Interlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7 Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Event classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Noisy areas rejection data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Tattoo segmentation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Planar height measurement with reference . . . . . . . . . . . . . . . . . . . . . . 27

3.6 Feature detection flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1 High level design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Data classes relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3 Background subtraction example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Event classification dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.6 Video separated in events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.7 Camera calibration dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1 Calibration results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Measured height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.3 Feature extraction results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

xi

List of Figures

xii

List of Tables

4.1 Video codec comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1 Calibration precision test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Event detector performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Height Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.4 Face detection classifiers performance . . . . . . . . . . . . . . . . . . . . . . . . . 47

xiii

List of Tables

xiv

List of Algorithms


xv

List of Algorithms

xvi

Acronyms

API Application Programming Interface. 30

BioFoV Biometric Forensic Video analyzer. 30, 37, 39, 40, 43, 50–52

CI Continuous Integration. 2, 31, 40

COST European Cooperation in Science and Technology. 7

CPU Central Processing Unit. 30, 36, 39, 45

CUDA Compute Unified Device Architecture. 36, 39

FOV Field Of View. 42, 44

FPS Frames Per Second. 42

GPU Graphics Processing Unit. 30, 39, 45

GUI Graphical User Interface. 31, 36

HFYU Huffman Lossless Codec. 38, 39

IPP Intel R© Integrated Performance Primitives. 51

LPC Laboratorio de Polıcia Cientıfica. 3, 6, 12, 19, 23, 25, 28, 42

MR Merge Request. 40

MSc Master of Science. 2, 19, 30, 50

OpenCL Open Computing Language. 36, 39

OpenCV Open Source Computer Vision. 13, 30, 36, 37, 40, 43, 46, 51

OS Operating System. 40, 52

PDF Portable Document Format. 51

xvii

Acronyms

PJ Polıcia Judiciaria. 6

SLAM Simultaneous Localization And Mapping. 51

UI User Interface. 30, 34, 51

XVID XVID MPEG-4. 38, 39

xviii

1Introduction

Contents1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1

1. Introduction

In this section a small introduction will be given to this project, followed by a brief review of the

state of the art on every scientific area explored. The following two chapters detail on both the

design and implementation of the program and the creation of the associated project are provided.

In the final chapters, results obtained from the program using real data will be presented, along

with the evaluation of these results, followed by some conclusions on what was done, and what

there is still left to do in the context of the proposed project.

1.1 Motivation

Given the lack of flexible tools to perform forensic video analysis in a crime scene investigation

context and observing that there is real world interest on having such a tool, it was decided to use

this Master of Science (MSc) thesis to create a program that would address this need.

The intent was not to write something that will be forgotten or not used after this work is

completed, but instead to provide an extensible tool that may serve as a one stop shop for third

party algorithms to be implemented on, and used as an integrated program that can bring other

student’s projects closer to real world usage.

1.2 Objectives

The objective of this thesis is to provide an extensible software platform that will be a useful

tool for forensic investigators, by fitting their needs and thereby minimizing the required workload.

Furthermore it will allow future students, members of the scientific community and members of the

open source community in general, to see their algorithms implemented in a real world application.

In order to get the project some initial traction, most of the effort of this thesis was focused

into automating a set of usually time consuming tasks in the video analysis process. This way,

less effort is required from the investigator to perform tasks that were once laborious, thus leaving

more time to focus on the aspects that require the expert’s insight, contributing to an overall quality

increase of the analysis.

After the initial traction was achieved, the focus turned to investigating which soft biometric

measurements can be extracted from surveillance videos typically available from crime scenes to

aid in the recognition task.

1.3 Main Contributions

The output of this thesis is not a program nor an algorithm, but instead a project that aims

to change the paradigm of the software tools used in police investigations. All the development

tools, Continuous Integration (CI) and Makefiles were designed to allow the project to be easily

tinkered with, and ensure its quality and continuity.

2

1.4 Dissertation Outline

Until now most tools used for criminal video analysis and investigation were not designed for

the job, so investigators need to use commercial video and image editing tools, jumping from one

to the other without any kind of data integration. This was observed both at Dr. Peter Kastmand

Larsen’s lab in the Unit of Forensic Anthropology from the Faculty of Health Sciences of the Uni-

versity of Copenhagen, and at the Laboratorio de Polıcia Cientıfica (LPC) (Portuguese Scientific

Police Lab), who have both used and tested the program. This thesis aims to provide a useful tool

for these specific teams, and others that may decide to use it.

This project will also allow future students to see their thesis and projects integrated in a real

project with real world implications, improving the judicial system by providing them with bleeding

edge algorithms and tools to streamline their work.

1.4 Dissertation Outline

After this introduction the current state of the art of video analysis techniques and biometrics

used for identification in a crime scene investigation environment will be laid out. This is followed

by the proposed solution to fill in the gap of specially crafted software for biometric analysis and

the implementation details of such project and performance/quality results. This document ends

by presenting suggestions for future work and project continuity, together with the conclusions

reached from the preparation of this thesis.

3

1. Introduction

4

2State of the art

Contents2.1 Video Analysis Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Evidence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Biometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 Video Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7 Evidence Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.8 Software Bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5

2. State of the art

This chapter starts by describing the current process of video analysis for both forensic teams

that collaborated in this project. It explores the biometric features that were considered and anal-

ysed in this work. Finally some of the most used video analysis software solutions available in the

market are analysed in order to pinpoint which are their strongest features and where they lack

capabilities that would be useful to an investigator.

2.1 Video Analysis Methodology

Since the target users for this project are the crime scene investigators, the best way to fit their

need is by understanding how they currently work with the tools currently available and how their

daily efforts can be optimized. For this reason, the following two investigation facilities shared

their experience, working methods, and needs:

• Laboratorio de Polıcia Cientıfica (LPC) da Polıcia Judiciaria (PJ) – Portuguese criminal po-

lice laboratory;

• Unit of Forensic Anthropology from the Faculty of Health Sciences of the University of

Copenhagen.

2.1.1 Laboratorio de Polıcia Cientıfica da Polıcia Judiciaria (PortugueseCriminal Police Laboratory)

The cooperation with LPC is the main objective of this project, with the objective of providing

them useful tools for data analysis. They provided an essential dataset of real crime cases with

the respective surveillance videos and analysis reports that, for confidentiality reasons, will not be

disclosed or used as test cases in this dissertation.

The current video analysis methodology used by LPC is very laborious, time consuming and

makes use of methods which are not ideal for the task. Techniques observed from the reports

provided for analysis and procedure descriptions of the investigators included:

• Watching an whole overnight or weekend footage of a surveillance camera in real time in

order to pinpoint the time of the crime, consuming a lot of man hours and is one of the main

bottlenecks of video analysis.

• Non ideal image editing techniques for superimposing suspect’s photos with frames from

the video with the perpetrator, which is mainly due to the lack of tools to correctly cross

relate biometric data.

One of the biggest hurdles described by the investigators was the fact that it is difficult to get

access to untampered crime scene in time. This problem will not be tackled in this project due to

the fact it is not a technical issue, but instead a legal one.

6

2.2 Evidence Analysis

2.1.2 Unit of Forensic Anthropology – Faculty of Health Sciences of theUniversity of Copenhagen

Integrated in the European Cooperation in Science and Technology (COST) Action IC 1106 [1],

the opportunity to go to Copenhagen, Denmark, for one week and a half was presented, with the

objective to get acquainted with the methods and needs of experts who have been successfully

presenting soft biometric data obtained from surveillance videos in court.

Peter Larsen was the expert who most shared his working methods. On the cases observed,

gait analysis by observation and silhouette superimposition were the most used techniques, along

with standing pose comparison. Other cases also showed the usage of a suite of software called

PhotoModeler for height estimation and other measurements.

2.2 Evidence Analysis

When analysing any case, there is the need to cross reference all the evidence data to reach

a meaningful conclusion. In the forensic community there is a proposed scale for the levels of

evidential weight, which helps the investigators to describe in court the outcome of their investi-

gation [2]. This scale is defined as follows:

Identification When it is certain that the suspect has the same identity as the perpetrator;

Strongly Indicated When several traits are found to point towards suspect and perpetrator hav-

ing congruent identity;

Indicated When few traits are found to point towards suspect and perpetrator having congruent

identity;

Cannot Be Excluded When it is not possible to perform a comparison between suspect and

perpetrator due to the (low) quality of the surveillance recordings;

Very Little Speaks In Favour Of When there is very little reason to believe that suspect and

perpetrator have congruent identity;

Elimination When it is certain that perpetrator and suspect cannot have congruent identity.

In the following sections the impossibility of identification, how biometric evidence can easily

be used for subject exclusion, and the difficulty of cross referencing evidence in order to give

meaningful likelihood ratios are discussed.

2.2.1 Impossibility of Identification

One of the most valuable lessons learnt from the COST Action contact with the Unit of Forensic

Anthropology from the Faculty of Health Sciences of the University of Copenhagen was that, no

7

2. State of the art

matter how good a biometric feature is, it can never lead to identification on its own. It is essential

to keep this in mind when it comes to compile the data obtained from the forensic analysis to

reach a conclusion.

A good example is the DNA itself, which is taken as the most identifying feature of any living

being. If a suspect’s DNA has been detected as being a match to the culprit’s, it is impossible to

state with full certainty that the suspect and the culprit are one and the same, since the suspect

may have a twin, even if he doesn’t know about his existence. To overcome this limitation and

draw rightful conclusions from the evidence gathered, it is essential to mix hard biometrics with

other kind of evidence, such as soft biometrics or any other sort of proof.

On the example of the lost twin, soft biometrics can weigh a lot on the final decision. A simple

scar, a tattoo or a person’s gait, may not identify a person, but help postulating the thesis that the

suspect is, or is not, the culprit.

2.2.2 Subject Exclusion

Even though it is not possible to identify anybody with a biometric measurement on its own, that

same biometric measurement can be used to exclude with absolute certainty another suspect.

Even though this may sound as something that can only be used by the defence of to exonerate

the suspect, it must be seen as a way to narrow down the suspect list. This is hard to do if the

investigators themselves are acquainted with the case and are convinced that a suspect is the

perpetrator, leading to possible involuntary extrapolations.

2.2.3 Likelihood Ratios

The gathering of bare biometric data is not enough to extract a meaningful conclusion about

a suspect. Therefore, there is the need to cross relate them and output a likelihood ratio that can

represent the certainty of the data gathered belong to the suspect.

2.3 Biometry

Biometrics are measurements related to human traits or features. There are two categories of

biometric features, hard and soft. Both these categories are explored in this section in order to

pinpoint which are suitable to be used in a video forensic environment.

2.3.1 Soft Biometrics

Soft biometrics are a specific type of biometrics related to how people usually distinguish each

other, from physical traits such as hair colour, height or tattoos to behavioural characteristics

like gait or posture. The article “Bag of Soft Biometrics for Person Identification” [3] makes an

extensive survey of soft biometric traits and the relation between these.

8

2.3 Biometry

2.3.1.A Gender

Gender is the classic soft biometric. It is one of the characteristics humans use to distinguish

each other, and one that almost all the population has.

2.3.1.B Ethnicity

By definition, individuals are of the same ethnicity if they identify each other as belonging to

the same cultural group, which usually implies having the same beliefs and daily habits, and these

same individuals may or may not share a common ancestor.

This biometric trait is useful in the forensic environment if the ethnicity in case has any non

physical trait such as characteristic clothing, characteristic haircuts or body paintings such as the

bindi. On the other hand, it is a specially delicate feature, since some ethnicities are associated

with certain behaviours, which may lead to making assumptions that are not based on evidence,

but instead a preconceived stereotype.

2.3.1.C Race

Race can be defined as a set of physical characteristics shared by a set of individuals with a

common ancestor. This feature is usually associated with other biometric features such as the

position of the eyes, shape of the nose and ears, height, skin tone, among others.

In the forensic environment, race can be more useful than ethnicity since, even if the perpe-

trators of a crime are in plain black clothes for example, they cannot change their physical traits.

As with the ethnicity, race is a biometric feature which is associated with certain stereotypes, and

as all evidence, may lead to erroneous conclusions if it is not analysed objectively.

2.3.1.D Clothing

Since nowadays most people use clothes from big brands that sell in large quantities, clothing

is rarely a unique trait. Therefore, identifying a person just because they are using a specific shirt

is out of question, but giving a high likelihood to a specific set of shirt, coat, shoes and jeans can

help on the case.

There is the problem of the perpetrator giving away his clothes to a homeless, who can be

identified as the culprit, slowing the case analysis or accusing the wrong individual. This can be

avoided by giving the a low weight on the likelihood ratio estimation, and by using more than just

the clothing as evidence. Clothing is a very useful biometric feature, since, if there were witnesses

to the crime, it does not need to be recorded by a surveillance camera to get a partial description

of it, since it is something humans tend to notice if they don’t know somebody.

9

2. State of the art

2.3.1.E Height

A person’s height is difficult to measure precisely, since even in ideal conditions the measure-

ments can end up with very different values of the same individual’s height. This is caused by five

main factors:

• The first and most significant factor is pose, which is very difficult to correct and causes high

errors on height measurements since a slight relaxation of the posture can easily reduce

one’s height in over 5 centimetres.

• Gait can also hinder the measurement process since the head height does not remain con-

stant during a whole walk cycle. The fact that both legs may be apart from each other, the

vertical projection of the top of the head on the floor can be difficult to estimate and therefore

induce error.

• Shoe soles, which can be of unknown thickness, can cause significant errors in height mea-

surement. If the shoes have been previously identified (2.3.1.D), their contribution can be

compensated in the total measurement.

• Hair or head wear can complicate the process of pinpointing the top of the head, possibly

introducing error in the measured height.

• An individual’s spine is compressed during the day and expands during the night, this causes

a person to “shrink” from the morning to the evening between 1 to 2 centimetres.

Event though it is hardly an identifying feature, height can be used in the forensic environment

to exclude a subject or contribute to the certainty of a suspect being the perpetrator.

2.3.1.F Weight

A very interesting feature that was also considered in the beginning was weight estimation,

mostly based on [4].

Due to the fact that the studies are not yet sturdy enough, the validity of the data obtained

through the analysis of the videos could not be of practical use since it would not be valid in court.

On the dataset provided by the Portuguese Police, most individuals are dressed with more

than one layer of clothing, making it very hard to make waist measurements and therefore give

any sort of rightful estimation.

2.3.1.G Gait

Gait is sometimes an identifying feature. An example of this was a case analysed by Dr. Peter

Kastmand Larsen, from the University of Copenhagen, on which the perpetrator had an injury

caused by a motorbike accident which left him limping in a very particular way.

10

2.3 Biometry

Gait recognition was also a considered feature to be used as identifier. Due to very low frame

rate and bad camera angles (most security footage is filmed from the top), it is very difficult to get

a good idea of an individual’s gait. This is worsened by the erratic movement and tense posture

most individuals take while committing a crime.

2.3.1.H Silhouette Matching

When an accurate measurement of the perpetrator is not possible and there are suspects, a

method used in Denmark is silhouette matching, illustrated in figure 2.1.

Much like gait, it tries to match posture and body volume, but only in two dimensions, unlike

gait, which takes variations in time into consideration too.

The silhouette extraction and superimposition method is currently used in Denmark, and it is

done manually, making it very laborious to extract and match more than a couple of promising

frames from a video.

Figure 2.1: Silhouette superimposition

2.3.1.I Tattoos

Tattoos are a very interesting and quite unexplored area of soft biometrics which has only

very recently been researched as a soft biometric [5]. They are very useful to group and cross

reference individuals, mostly gangsters.

Furthermore, some gangs, such as the Yakuza, do not allow for their members to show their

tattoos in public at all, making this biometric feature useless on surveillance video. Restricting its

usefulness for the processing of a subject in a police precinct in order to cross reference him with

a know database.

2.3.2 Hard Biometrics

Hard biometrics are non behavioural biometrics and traditionally unmodifiable traits.

11

2. State of the art

2.3.2.A Fingerprint

The fingerprint is the traditional and most studied hard biometric feature. The computational

analysis is based on minutia matching and it has been thoroughly studied [6]. Minutias are fea-

tures of the fingerprint that can be easily recognized. The main three ones are ridge endings,

bifurcations and dots. In the specific case of forensics of surveillance video, fingerprints have

very little to no use cases. Therefore they are not going to be further applied in this project.

2.3.2.B Palmprint

Much like fingerprints, palmprints have been used traditionally among the forensic commu-

nity and are a proven technology widely used with very good results [7]. And again, much like

fingerprints they do not have many uses in surveillance forensics.

2.3.2.C Hand Veins

Palm vein pattern recognition is one of the topics currently under heavy research in biometry

[8]. It provides hard proof which may become a good biometric trait namely for authentication.

Besides providing a hard to copy pattern, it is extremely hard to fake since there is the need to

simulate the blood flowing through the veins and arteries. This approach can provide really good

results with limited budget, as shown in [9].

2.3.2.D Face

The face is a proven biometric trait that is widely used for recognizing an individual, not only in

human to human interactions, but also on human to machine interactions. Recently it has been a

very active field of study, making use of deep neural networks in order to perform multi-view face

detection [10] [11]. Face proportions are a hard biometric feature that is used by the police for

identification, but the method used for identification is mostly only human judgement.

2.4 Camera Calibration

From the cases presented by the LPC, there were two different common cases which required

camera calibration. In the first case, the camera remained intact and untouched after the crime

took place and it is possible to calibrate it. While in the second, the camera was either destroyed

in the process of the crime, tinkered with afterwards, or it is not accessible due to bureaucratic

reasons. For simplicity reasons, and considering the characteristics of the dataset provided by the

Portuguese police, all the cameras considered in this thesis are static cameras with fixed zoom.

12


2.4.1 Camera Calibration With Camera Access

In the first situation it is easy to go to the crime scene and calibrate the camera with precision

using a calibration pattern, enabling a good calibration with low errors. One of the existing meth-

ods to achieve this is using multiple orientations of the same planar pattern [12]. This process

of camera calibration does not require specialized hardware, since all that is needed is a planar

pattern, easily printed in any office printer.

2.4.2 Camera Calibration Without Camera Access

Even thought the first situation is the ideal one, the contacts with the Portuguese police indicate

that it is generally difficult to have access to the cameras that filmed the crime. This is mostly due

to the security companies, who installed the cameras, not wanting third parties to be handling their

systems. The reluctance towards allowing access to the surveillance systems in order to perform

crime scene analysis seems to be very difficult to change in Portugal. Another case where this

happens is in temporary crime scenes, for example in music festivals or other places that are

scheduled to be dismantled.

This situation requires the camera to be calibrated using exclusively the videos which were

submitted for analysis. A method considered for this project was the camera calibration using the

plumb-line constraint and minimal Hough entropy described in [13], but due to the fact that it may

give bad results on low quality videos, this algorithm was not included.

2.4.3 Stereo Calibration

If two or more cameras are filming the same crime scene (from two different points of view),

it is possible to calibrate a stereo configuration with them, allowing error correction, 3D modelling

of the recorded scene, and virtual camera position and angle changes. Currently, Open Source

Computer Vision (OpenCV) implements functions to perform this action.

2.4.4 Colour Calibration

Colour is a very important part of an image, and consequently of a video. It has special

importance since it may be crucial for the analysis to determine the clothing, hair or skin colour of

a suspect. Given that the lighting of the environment is the same in the evidence as it is at the time

of calibration, it is possible to correct the colour distortion that may be introduced by the camera.

With the use of a colour calibration pattern, such as the Macbeth ColorChecker [14] seen on fig.

2.2, it is possible to correct the colour of the camera.

13

2. State of the art

Figure 2.2: Macbeth ColorChecker

2.5 Photogrammetry

After the cameras have been calibrated, it is possible to perform measurements of objects

or subjects in the scene, making use of the videos or images recorded. There are two distinct

scenarios that may arise which allow different ways of measurement.

2.5.1 Single Camera Photogrammetry

Using a simple static calibrated camera and making use of known landmarks in the scene,

it is possible to reconstruct a scene in three dimensions. Figure 2.3 shows the reconstruction

of a perpetrator walking into a store and stepping next to an exhibitor which can be used as a

reference to estimate the height of the perpetrator during the walk cycle. Since there is only one

camera filming the scene, it is not possible to get a depth measurements from the camera, thus

making preferable the use of the technique presented next.

2.5.2 Stereo Camera Pair Photogrammetry

Stereo photogrammetry is the technique of obtaining depth perception using two cameras,

similarly to the way human eyes work. By using two static security cameras, which do not need

to be identical, it is possible to perform measurements between two static points that show up on

both cameras. Figure 2.4 shows an example of a camera disposition that allows for the measure-

ment of the perpetrator. If the cameras are synced, it is even possible to make measurements on

moving objects by taking the two frames correspondent to a given instant (one from each camera)

and compare them.

14

2.5 Photogrammetry

Figure 2.3: Single camera

(a) Image from camera 1 (b) Image from camera 2

Figure 2.4: Stereo camera pair

15

2. State of the art

2.6 Video Filtering

Video filtering is a very important step that allows a better analysis. Its objective is to enhance

the quality of the video and therefore the biometric features to be analysed afterwards. The tool

which is most used by the police investigators who participated in this paper for video processing

is called Forevid [15] and will be further presented in 2.8.3.

2.6.1 Denoising

Video denoising, as the name suggests, is the removal of noise from a video. In the forensic

environment this noise is usually present on videos recovered from old analogue systems where

the tapes, may be worn out, or in low light situations, where the gain of the sensor amplifies noise.

An example of the results of non-local means denoising [16] is presented in figure 2.5.

(a) Original image (b) Image with noise (c) Denoised image

Figure 2.5: Denoising example [16]

2.6.2 Deinterlacing

Before defining deinterlacing, one needs to define what interlacing is. Interlacing is the tech-

nique used in video recording of storing only half the lines for each frame, first the odd ones,

then on the next frame the even ones, and so on. This way static video suffers no change, but

sudden horizontal movements have clear noise in visible horizontal lines. The figure 2.6a shows

a frame as captured by a camera, with interlacing. Deinterlacing uses adjacent frames in time

and reconstructs the one that happened in between them, but was not recorded. By combining

the two adjacent frames it is possible to reduce the noise introduced by the interlaced recording.

The resulting frame is clearer as shown in figure 2.6b.

2.6.3 Sharpening

Sharpening consists on enhancing edges in order to make the details of an image more clear

and easily distinguishable. Figure 2.7 shows two photographies of the moon, being 2.7a the

original and 2.7b the image resulting from the sharpening process.

16

2.6 Video Filtering

(a) Original interlaced image (b) Deinterlaced image

Figure 2.6: Interlacing example [17]

(a) Original image (b) Sharpened image

Figure 2.7: Sharpening example [18]

17

2. State of the art

The most used method to achieve this result is called unsharp masking, and it consists on

taking the original image, blur and invert it, and do a weighted subtraction with the original image.

This is defined by equation 2.1 where α is the weight of the subtraction and black the maximum

value for a channel (which in an 8 bit greyscale image is 0).

Iunsharpened = I − α(black − blur(I)), 0 ≤ α < 1 (2.1)

2.6.4 Superresolution

Superresolution is a technique used to enhance the original resolution of a video frame by

taking into consideration the neighbour frames which may provide more information about the

object in analysis [19]. This technique is specially useful for enhancing details on videos with a

reasonably high framerate or in slow moving objects or subjects. This is due to the fact that, in

order to get the best results, the image detail should be in the same orientation in all the frames

used.

2.7 Evidence Documentation

Evidence gathering and documentation is an important part of an investigator’s work, since it

bundles all the evidence and the final result of the evidence analysis into a document. System-

atic approaches and strict documentation protocols are essential in order to produce reproducible

data in case the images have suffered transformations during the analysis process (such as en-

hancements). Forevid, which will be further presented in 2.8.3, is a good video documentation

tool, that allows users to pinpoint specific frames on the videos of the crime and comment on

them, bundling them all in a single document ready to be appended to a case and presented in

court as evidence.

2.8 Software Bundles

Both the Portuguese criminal police and the Danish forensic experts currently work with var-

ious software platform which are either expensive or not tailored for surveillance video analysis.

They are mostly divided into three categories: Analysis; Processing and Documentation. There

are some software bundles available like Video Analyst R©from Intergraph [20], and Kinesense’s

forensic video retrieval, search, analysis and reporting tool [21] that can be used in forensic video

analysis. The latter one’s Law Enforcement (LE) suite can be compared to the project described

on this dissertation in terms of functionality.

18

2.9 Summary

2.8.1 Calibration and Photogrammetry

Camera calibration is an essential task that must be performed correctly to obtain the best

measurements possible, either on 2D, 3D or colour space. The Danish experts that collaborated

in this project and the LPC currently use a program called PhotoModeler [22] to perform both the

calibration and spatial measurements.

2.8.2 Analysis

Currently, in the forensic community, bare video analysis is done by a human. Besides being

prone to error, this has an inherent severe problem, the manpower required to perform this task.

An easy to operate motion detection method needs to separate a long video into several small

events on which motion is detected, easing the analysis, reducing the man time needed to run

through several hours of video and the possible errors induced by that repetitive task.

2.8.3 Forevid (Video Processing and Documentation)

Forevid is an opensource software tool for video enhancement and documentation developed

by Sami Hautamaki for his MSc’s dissertation in cooperation with the Forensic laboratory of the

National Bureau of Investigation in Finland. Development stopped in late 2012 but it is still used

by police departments all over Europe. The project proposed in this thesis shares the openness

and target community of Forevid, but has a different objective since the purpose of this project is

biometric analysis and not video processing.

2.9 Summary

Although there are many software tools and algorithms available for video and biometric data

processing, they are either not publicly available due to copyright, or not implemented in any

software tool which may be used by an investigator rendering them useless for the investigation

process. The proposed platform aims to fill this gap by compiling selected algorithms and tools

into a single one specially crafted for the forensic investigator.

19

2. State of the art

20

3Proposed Video Analysis System

Contents3.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Biometric Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

21

3. Proposed Video Analysis System

Given the problems and technologies described in the previous section, the goal of this project

is to design a software solution that can integrate both video processing capabilities and biometric

analysis from those same videos, solving the current problems that the investigators face while

being flexible enough to be extendible so that it can solve other future issues.

In this chapter the developed solution will be described in a functional way, starting by stating

the intended workflow and proceeding to the video processing algorithms and biometric mea-

surements analysed during the development of this project. Chapter 4 will then focus on the

implementation itself.

3.1 Workflow

This project intends to streamline the analysis of a forensic video. In order to achieve this, the

first thing to do is to define the user’s intended workflow. The workflow is separated in two major

sections, the first being the video processing, and the second the biometric analysis (figure 3.1).

Figure 3.1: Workflow

3.1.1 Video Processing

The video processing part has as two main objectives, correcting distortions and segment the

video in order for the investigator to only need to focus on the interesting parts of the video. This

process is separated into several steps:

1. Video loading;

2. Camera correction:

(a) Camera calibration;

(b) Orientation correction;

22

3.2 Video Processing

3. Video segmentation:

(a) Noisy areas exclusion;

(b) Event detection (motion detection).

From all of these steps only the first one that is mandatory since the video may already have

been previously corrected for distortion and/or segmented to a specific period of interest.

3.1.2 Biometric Analysis

After processing the video, the actual biometric analysis of the perpetrators can be performed.

There is no predefined workflow for every biometric feature since each one may require very

distinct user interactions and generate very different outputs, which result in unique workflows for

each biometric trait. Each biometric features considered as candidate for integration in this project

will be analysed in section 3.3.

3.2 Video Processing

Processing the video is the first step prior to analysis. This allows for the minimization of

the video that needs further analysis, minimizing also the time needed for analysis, and for the

correction of the camera distortion and orientation, improving the results of the biometric analysis.

3.2.1 Event Detection

The event detection was the first feature to be implemented in the program since it is the one

which LPC has the most need for. Its objective is to drastically reduce the amount of time needed

by the investigator to analyse a long video. The event detection algorithm consists actually in two

separate algorithms. The first being the one used to calculate the changes between consecutive

video frames using background subtraction. The second is the Event classification, where the

motion detected is classified as being an event or just noise.

The first algorithm is called background subtraction, and the particular variation used to per-

form this difference is the adaptive Gaussian mixture model for background subtraction described

in [23]. It needs several inputs besides the frames themselves:

Frame history Maximum length of the frame history to consider in the past of any point within

the video.

Threshold Threshold on the squared Mahalanobis distance to decide whether it is well described

by the background model [24].

Shadow detection Whether shadow detection should be enabled.

23


If the difference between frames is higher than the configured threshold (equation 3.1) for the

event classifier, and this difference persists for more than an also configurable minimum period of

time (defined to discard possible noise or irrelevant periodic changes), it is classified as an Event

(figure 3.2). After this same motion period ends, in order to keep context, if there is movement

again before a certain period of time runs out, the resulting Event is prolonged until the end of this

newly detected period with motion.

threshold =pixels with movement

total amount of pixels in image(3.1)

Movement detected

Event

Min

length

Min

lengthMax

gap

Max

gap

Max

gap

Time

Figure 3.2: Event classification

Therefore the Event classifier also needs some configuration parameters which are:

Threshold Percentage (in area ratio) of the thresholded result of the background subtraction that

needs to be set as changed for a specific frame to be classified as having movement.

Max gap Maximum amount of frames with no movement before considering the event as over.

Min length Minimum number of frames with movement in an event. If less are detected, the

event is discarded (used to disregard possible encoder noise).

3.2.2 Noisy Area Exclusion

Excluding noisy areas for automatic motion detection is a very useful method for eliminating

things such as hard coded timestamps or static regions with constant variations such as TVs or

windows. The simplest way to exclude a noisy area, once there is a motion detection method, is to

fit a static mask over the intended exclusion area on the video frames prior to inputting them in the

motion detection algorithm, as depicted in figure 3.3. This way, the mask will be of a static value,

immutable for the duration of the whole video, ignoring therefore any motion that may happen on

that area.

Figure 3.3: Noisy areas rejection data flow

24

3.3 Biometric Data Extraction

3.2.3 Camera Calibration

The camera calibration method chosen for this project makes use of a planar chess pattern,

like the one in 5.1a, and was implemented making use of algorithm 3.1, where optimize matrix()

is based on the method described in [12]. Instead of asking the user to provide a set of frames

with the calibration pattern, a whole video is used from which equally spaced frames are scanned

for the pattern.

Algorithm 3.1 Camera Calibration

Require: datasetEnsure: ∃ frames ∈ dataset

for all frames ∈ dataset doif ∃ pattern ∈ frame then

for all points ∈ pattern dofind subpixel(point)

end forpatterns← patternspattern

elsediscard(frame)

end ifend forif |patterns| > 0 thenoptimize matrix(patterns)

end if


Several of the features described in 2.3 were explored in the process of this work, more pre-

cisely: Tattoos; Height; Face. The other features previously listed (gait, clothing, gender, ethnicity,

hair colour, periocular region, fingerprints, palmprints, hand veins and iris) were not researched

deeper because they were unusable in this project, they were not good candidates for integration,

and some because there simply was no time to integrate them for this first stage of the project.

3.3.1 Tattoos

Despite it not being integrated in the final project, some work was put into creating a segmen-

tation tool based on the watershed algorithm [25] that could isolate the inked part of a tattoo. One

example of a result obtained using this algorithm is shown in figure 3.4.

It was decided not to integrate the implemented tool or any other to detect, isolate or identify

tattoos in the present version of the platform, mainly due to the low usefulness to the LPC in

common cases. Despite that, the developed tool’s source code was made publicly available

in [27].

25


(a) Original image [26] (b) After segmentation

Figure 3.4: Tattoo segmentation results

3.3.2 Height

The simplest technique for height measurement is by comparing the height being measured

to a known reference. If this reference is in the same plane as the feature being measured, then

this plane can be re-projected to the image plane, making them one and the same thus allowing

for the measurements performed in the image to be extrapolated to the plane where the reference

and the feature are.

From original image (figure 3.5a) four points of a square or rectangle are given by the user,

from left to right and top to bottom. Each point gets its coordinates averaged with the adjacent

horizontal and vertical point, forming a rectangle with perfectly vertical and horizontal sides. The

image transformation needed to get the points from their position in the original image to the new

rectangular shape is calculated, and applied to the whole image. The result of this process is

shown in figure 3.5b. The resulting image has the advantage of being isometric on the plane of

the board, but with detached vertical and horizontal axis. In other words, it is possible to make

horizontal measurements in the same plane as the cork board and compare them to the horizontal

width of the board. The same principle applies for the vertical axis but not to any combination of

the two since the height and width of the reference rectangle is not taken into consideration in the

re-projection. Implementation details will be further explored in 4.6.

26


(a) Original frame (b) Re-projection with reference

(c) Height measurement

Figure 3.5: Planar height measurement with reference

27


3.3.3 Face Detection

Since the face is widely present in the dataset provided by the LPC and it is a biometric trait

accepted in the Portuguese court, this was the first feature to be focused. Before this thesis, the

laboratories that cooperated with this project did not have way to extract a facial dataset from a

video. Instead, if there was the need to extract more than one face from a video, the solution

would be to hand select them from each frame. To overcome this problem, a feature detector

needed to be integrated in the project. The feature detector chosen was initially proposed in [28]

and later improved in [29].

The method used for processing a video with the objective of getting all the features that

match a certain classifier is described in figure 3.6. It was designed with face detection in mind,

but supports any kind of feature, as long as it is described by a cascade classifier.

Figure 3.6: Feature detection flowchart

28

4Implementation

Contents4.1 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Modularity of the Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Data Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 User Interface Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5 GUI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.6 Height Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.7 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.8 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.9 Other Useful Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.10 Computational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.11 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.12 Open Sourcing the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.13 Build System and Continuous Integration . . . . . . . . . . . . . . . . . . . . . 40

29

4. Implementation

Following the system features described in the previous chapter 3, the implementation details

will be described in the following sections. Starting by listing the technologies used and the

implementation of the program itself, followed by the documentation and additional details on

some of the project’s decisions such as opensourcing it. The program that resulted from this MSc

thesis was dubbed Biometric Forensic Video analyzer (BioFoV).

4.1 Technologies Used

The implementation of this project had two main technical sides. The first being the User

Interface (UI) and the need to provide a fully working cross platform and modular framework. The

second being the video and image processing functionality. The ability to run the program under

Microsoft Windows R©is a requirement for this project, therefore, in order to ease development and

provide full cross platform support, the chosen libraries needed also to be cross platform. To

cover these needs, two libraries were used:

• Qt [30]:

A cross-platform application and UI framework for using C++. It is used to enable cross plat-

form operation of the UI and system dependent functionality, such as access to configured

printers and the file system;

• OpenCV [31]:

The most well known and actively developed open source computer vision and machine

learning software library. From the set of tools provided by OpenCV the ones used are

explained and the need for them justified in this section. The final version of the program

includes OpenCV 2.4.11, which was the latest stable version released to date. Version 3.0.0

beta was released prior to the conclusion of this thesis, and even though it promises very

high performance improvements (up to nine times faster processing and heterogeneous

Central Processing Unit (CPU) + Graphics Processing Unit (GPU) processing), it was not

used due to its instability and non uniformity for the time being. As soon as version 3.0.0 of

OpenCV is released it should be considered to replace the current version, but that will be

left for future work. Its C++ Application Programming Interface (API) is used to perform all

of the video and image operations;

To aid on the development itself, and ensure code quality and project continuity, other tech-

nologies were used:

• Doxygen [32]:

A documentation generator for C++, used for documenting the source code of the project

and mapping the class hierarchy;

30

4.2 Modularity of the Framework

• GIT [33]:

A widely used version control system, used to keep track of modifications on the source

code of the project, and enable an agile development method;

• GitHub [34]:

A GIT hosting service free for open source projects, to host the code, enable social inter-

action between developers, issue reporting and tracking, host releases of the project and a

wiki for the user manual and development instructions for whoever wants to contribute;

• Travis CI [35]:

A CI platform free for open source projects, to perform tests on the committed code to ensure

that there are no destructive changes to the program during development.

4.2 Modularity of the Framework

The program is implemented with modularity and expansion in mind so that future work can

be easily implemented on top of the existing features, enhancing the original feature set. A set of

functional modules, are the base for this project, illustrated in Figure 4.1. A Graphical User Inter-

face (GUI) is made available in order to be usable by forensic analysers of different backgrounds.

User Interface

Menus

Player

Controls

Dialogs

Data Classes

Frame

Event Mask

Video

Modules

(...)

Event Detector

Feature Detector

Measurements

Figure 4.1: Proposed modular architecture – high level design

4.3 Data Classes

As the name suggests, the data classes are where the data, gathered from the analysis of the

videos, is stored and processed. As illustrated in 4.2, there are five distinct data classes: Video,

Camera, Event, Snapshot and Feature.

Figure 4.2: Data classes relationship

31

4. Implementation

4.3.1 Video

The Video class is responsible for everything that is video related. The most immediate of

the operations is video reading, which includes opening a video file and decoding it to be further

processed. It can also apply the motion detection algorithm described in 3.2.1, creating a set of

Events associated with this same video.

4.3.2 Camera

This class comprises every aspect related to camera calibration and description. With it, it is

possible to calibrate a camera with a chessboard pattern and perform transformations such as

flipping a video or image (vertically and/or horizontally). It can also export its parameters to a file

in order to later import them to another video.

4.3.3 Event

The event class keeps a set of frames and snapshots together. These frames should be

contiguous since an event is supposed to be an interval in time, but for usability reasons they don’t

need to be. For example, exporting several events in one video file can be achieved by merging

the events in question and exporting the resulting Event. An Event is in its essence two series

of frames, being the first one of Frames (figure 4.3a), and the second one of the corresponding

Snapshots (figure 4.3b) that resulted from the background subtraction calculation. This allows for

the isolation of the object of interest (figure 4.3c).

4.3.4 Frame

The Frame class is a representation of a single frame of interest of a Video or Event. An

instance is related to either the Video or the Event object it belongs to. The frames themselves

are stored in individual files on disk, even though this allocates some extra hard disk space, it

allows for swifter video seeking since it eliminates the need for time consuming video decoding

processes.

4.3.5 Snapshot

This class works as a mask for a Frame object, in order to specify a certain region of interest.

For the moment, this class is used for the Face class to specify where in the frame the face is

and to store the background subtraction result (figure 4.3b), but it can be easily applied in other

situations.

32

4.3 Data Classes

(a) RGB (b) Mask

(c) Masked

Figure 4.3: Background subtraction example

33

4. Implementation

4.3.6 Feature

The Feature class describes something from an Individual (4.3.7). It encloses a description, a

pointer to an individual, and is used as a common interface for other classes. For the time being,

only one feature is implemented, but others can be easily built taking this one as template.

Face The Face class implements the Feature class, and it is a first example of how it can be

used. It bundles several Snapshots that delimit a person’s face in several different frames.

4.3.7 Individual

The purpose of this class is to bundle all the features from a certain individual. It was left

out of the current implementation since feature bundling (such as face clustering) has not been

implemented yet.

4.4 User Interface Classes

The following classes implement an easy to use interface to access and interact with the data

stored in the classes described in section 4.3.

4.4.1 Player

A Player is a base class that works as a common interface for the Video, Event and Frame

classes, and defines common methods for interacting with all of these classes in order to obtain

images, and skim through them without the need to worry about each class’ particular data struc-

tures. This interface is used by the VideoPlayer class to obtain images and show them to the

user, and by the image export tool to save isolated images to disk.

4.4.2 Drawable

The Drawable class is the base class for everything that is drawn on top of an image (Player).

It gives a flexible and consistent way to handle mouse clicks to insert points in the UI, and to draw

the output of the function in it in a homogeneous way across all the classes. Currently there are

several classes that extend Drawable:

Angle The angle measured in the image plane.

Length The length measured in the image plane.

Width The width measured in the image plane.

Height The height measured in the image plane.

34

4.5 GUI Implementation

Figure 4.4: User Interface

4.5 GUI Implementation

Even though the implementation of the GUI is not that much of technical interest in an aca-

demic environment, it is an essential component of this project that aims to be usable by people

with limited programming skills. Figure 4.4 shows all the UI components that belong to the main

window. In this figure two videos are loaded into the program, the event detector algorithm has

been applied to one of them, and one frame was extracted from the last of the resulting events.

A modifier has been applied to this last frame, in this case a reprojection of the plane of the pin

board to the image plane.

Toolbar (1) Can be configured with any function available in the program menus;

Video tab (2) Tree structure listing all the videos in the current project and their respective as-

sociated objects, such as detected events, extracted frames, and measurements on these

frames;

Feature tab (3) Tab for feature detection output, currently called Faces since it is the only feature

detected (shown collapsed in this figure);

Selection details (4) Information about selected item(s) (in figure 4.4 it is showing information

about the event E1);

Video player (5) Handles any class which implements the Player interface, enabling or disabling

the playback buttons and sliders, depending on the amount of frames available in the current

35

4. Implementation

playing item.

4.6 Height Calculation

As described in 3.3.2, the current implementation of the height calculation is performed mak-

ing use of planar references that can be used to re-project the image and be compared to the

perpetrator.

4.7 Motion Detection

The algorithm used for background subtraction previously described in 3.2.1 has several im-

plementation integrated in OpenCV, allowing it to use CPU, Compute Unified Device Architecture

(CUDA) or Open Computing Language (OpenCL) or even switch between them depending on

the hardware the program is running on. When the automatic event classification is selected, the

dialogue shown in figure 4.5 asks for five essential parameters the user has to input in order to

guide both algorithms, and a sixth which was left on the GUI to allow further tweaking in certain

situations. For the background subtraction method these variables are:

Frame history Length of the history.

default = 20s · FPSvideo

Threshold Threshold on the squared Mahalanobis distance to decide whether it is well described

by the background model [24].

default = 50

Shadow detection Whether shadow detection should be enabled.

default = false

Furthermore, so does the Event classifier:

(Area) threshold Percentage of the thresholded result of the background subtraction that needs

to be set as changed for a specific frame to be classified as having movement.

default = 1%

Max frames with no movement Maximum amount of frames with no movement before consid-

ering the event as over.


Min frames/event Minimum number of frames with movement in an event. If less are detected,

the event is discarded (used to disregard possible encoder noise).


36


Figure 4.5: Event classification dialogue

Even if this process of detection is somewhat slow (refer to 5.4.2), it is an unmanned process,

meaning it does not need any type of supervision and can be left running through the night, giving

the user more time to do something else.

Figure 4.6: Video separated in events


The video calibration tool in BioFoV implement OpenCV’s camera calibration function. As

reference points, for this same calibration, a chess pattern is used, with configurable width and

height in terms of the pattern’s inner corners, that are automatically detected in a calibration video

of the same camera, as shown in figure 4.7.

Horizontal and vertical image flip are also considered part of the calibration, since they reflect

on the intrinsic parameters of the camera. This was a key feature since it is not uncommon for

surveillance cameras to be mounted upside down, in which case there is the need to flip the video

37

4. Implementation

prior to analysis.

Figure 4.7: Camera calibration dialog

4.9 Other Useful Features

During the development process there was the need to implement some usability features that

complement the program’s usability and fulfil some of the day to day needs of the users. Among

these are the ability to export events to video files, and to print a single frame being shown in the

video player.

4.9.1 Export Events

Two codecs for encoding were tested, Huffman Lossless Codec (HFYU) and XVID MPEG-4

(XVID). Given the qualitative characteristics presented in table 4.1 the obvious choice was to opt

for the XVID encoder. Even though XVID it is not lossless, for the tested videos, the quality of the

output was more than enough to preserve the video details while producing an output file many

times smaller, thus better for evidence archive.

38

4.10 Computational Requirements

Codec Quality File SizeHFYU Perfect HugeXVID Really good Tiny

Table 4.1: Video codec comparison

4.9.2 Print Tool

A print tool was implemented, that uses the system defined printers (on whichever platform

the program is running), and prints the frame that is being displayed on the player. All printing

functions are handled by Qt’s class QPrinter, making it platform independent and hassle free.

4.10 Computational Requirements

In terms of processing power there can be two different approaches. The CPU one, where the

more cores available, and the more powerful the processing unit is, the better. And there is the

GPU approach, where with a modest CPU and a decent GPU with CUDA or OpenCL support, the

load of image and video processing can be offloaded to the GPU.

4.11 Documentation

In order to enable new users and developers to get properly onboarded on how to use and

develop BioFoV there is the need to create a user manual and to document the technical details

about the project’s structure.

4.11.1 Technical Documentation

The source code is documented making use of Doxygen [32]. It outputs a PDF file for printing

as well as an HTML folder that makes documentation browsing and function or class lookup easy.

The make rule for generating the documentation is named doc.

4.11.2 User Manual

The user manual is available as a wiki on the project’s GitHub page. It is a wiki since it is the

format that better allows for the users themselves to contribute to the documentation to improve

it, and also to browse it.

4.12 Open Sourcing the Project

This project had to be developed from scratch since there were no open source tools that

could be built on top of. Making the source and documentation available will enable anybody that

does not want to start from scratch to pick up where the project stands, and contribute to one

39

4. Implementation

uniform platform that will get better with each person’s contribution. For that purpose, the source

code was made publicly available [36]. This enables future development by third parties, allowing

other students and the scientific community to make their contribution and for all the investigators

to take advantage of this tool.

4.13 Build System and Continuous Integration

In order to make the program easy to build, deploy, test, release and install, a build system

used for bundling all the libraries used by BioFoV was created. It consists on a set of Makefiles

and a CI system.

4.13.1 Make Rules

A set of Makefiles has been made available to enable easy compilation of the project and its

dependencies. Both OpenCV and Qt are automatically configured and compiled so that a not so

technical user may start developing a module without the need to worry about linkage details of

these libraries.

4.13.2 Continuous Integration

The CI services from Travis CI [35] assure that the program keeps on compiling after every

commit, avoiding potential regressions. Regression tests were not implemented even though it

would be good to have them. But since for now this is a small project, it is easy to review every

commit and make sure it does not break anything. Once the project gets larger, every Merge

Request (MR) should be accompanied by its own set of tests showing what it fixes, or that it does

not break other features.

4.13.3 Cross Compilation Support

The way the project is built takes into consideration the preference of the user for Operating

System (OS), but it is beneficial to be able to compile it to another platform that is not the one

where it is being developed on. Therefore cross compilation is an important piece to make this

work available to anyone.

4.13.4 Static Build

Statically linking the program reduces the effort needed by a non technical user to install it,

and it proved to be a major plus for the program when presenting it to new users since all they

had to do was double click an icon to get it running. When there was still no static build the new

users refused to use the program since it was hard to update, and due to the rapid iteration and

continuous deployments, it was a task they would have to do often.

40

5Results

Contents5.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.5 Height Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.6 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.7 GUI Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

41

5. Results

5.1 Testing Setup

The test machine used to run the benchmarks has an Intel Core i7-3770 clocked at 3.40GHz

with 16 GB of RAM (4x4096 MB @ 1600 MHz), running the linux distribution Kubuntu 14.10.

5.2 Datasets Used

Since none of the videos provided by the LPC can be disclosed, in order to perform the analy-

sis of the results, some test videos were recorded to be used in the public tests presented in this

document. These videos characteristics are described in this section.

5.2.1 Long Video

The main test video for event detection needs to be extremely long in order to show the anal-

ysis boost it can give to the users. For that reason, a video was recorded which has 2430553

frames at a frame rate of 10 Frames Per Second (FPS), resulting on a total time of approximately

3 days, 20 hours and 9 minutes. The video codec used to encode this test video was H.264 with

a resolution of 640 by 480 pixels, similar to what can be found in most surveillance systems. It

was recorded in a research computer room, where the camera was strapped to the ceiling (upside

down) and pointing at the door the whole time. Due to the lack of strength of the webcam’s arm,

the image slowly slides down during the footage.

5.2.2 Short Video for Height Measurement

A short video was shot in the same conditions and with the same encoding as the one pre-

viously described in 5.2.1. But instead of focusing on the sheer length of the video, this one

shows an individual exiting the room, coming back in, and standing next to a couple of objects.

This video is used for short event detection (where two events must be detected), and for height

measurement tests.

5.2.3 Videos for Calibration

For the camera calibration testing four videos provided by Dr. Peter Kastmand Larsen were

used. These videos were shot using a GoPro Hero camera in various settings: narrow, medium,

wide and super-wide Field Of View (FOV). All with a resolution of 1920 by 1080 pixels and 29

FPS. These videos show a chess pattern with 7x9 inner corners moving in all the sides of the

canvas.

42

5.3 Calibration

5.3 Calibration

The re-projection error of the calibration tests performed on the previously described videos

are shown in figure 5.1. These tests were performed using 100 frames equally spaced throughout

the video in order to try to optimize the distribution of the pattern location in the image, so if a

video had 2000 frames, the frame skip value was 20. Only one iteration of the algorithm was

performed since it did not show any benefits for doing extra ones, and the computational time

increased significantly.

5.3.1 Precision

The re-projection errors from the GoPro in the different modes are shown in table 5.1. As it is

visible from the distortion in figure 5.1h, and from the increasing re-projection error in table 5.1,

the wider the camera becomes, the poorer the fit of the camera model used becomes. To tackle

this issue, the usage of another camera model specially tailored for fisheye cameras should be

used. This camera model is conveniently implemented in OpenCV, but was not yet integrated in

BioFoV.

Camera mode Re-projection errorNarrow 2.015Medium 5.772Wide 9.656Super wide 9.102

Table 5.1: Calibration precision test results

Making a video that where the pattern would be moved through the whole canvas of the cam-

era proved to be a challenging task, since the users asked to move the pattern uniformly in all

corners of the camera could not do it properly without video feedback, which is not available in

most cases for surveillance cameras. The usage of a pre-defined image dataset chosen by the

user can be a way to solve this problem. The downside of this alternative process is that it would

take more time to perform this choice of frames. Another way to solve this issue can be by de-

tecting the pattern in every single frame, and instead of using all of the detected patterns, picking

a set widespread through the whole canvas. This second option is a better, though more uncon-

ventional approach, since it can provide the quality of the dataset and the ease of calibration at

the same time. The only downside would be the processing time required to detect the pattern in

every single frame, but this is not a relevant problem, since this only has to be done once for each

camera, and the calibration process is not a long one.

43

5. Results

(a) Narrow FOV before calibration (b) Narrow FOV after calibration

(c) Medium FOV before calibration (d) Medium FOV after calibration

(e) Wide FOV before calibration (f) Wide FOV after calibration

(g) Super wide FOV before calibration (h) Super wide FOV after calibration

Figure 5.1: Calibration results

44

5.4 Event Detection

5.4 Event Detection

The events detected using the video described in 5.2.1 are triggered by two types of changes,

people getting out and getting in of the room, and the corridor lights being turned on or off were

also triggered as an event since the door has a glass window on top that leads to the corridor.

Since the computer room has two walls that are in fact large windows facing both south and east,

the full length of the video included large differences in lighting caused by the sun which did not

pose any problems to the event detection algorithm since they were not sudden changes.

5.4.1 Precision

The resulting events lack context since it is only shown in the period that is stored in the

resulting event is only associated with the period where motion is detected. To improve the context

it would be a good idea to include in the event a certain amount of frames or period of time before

and after the motion was detected.

5.4.2 Performance

To scrape the same video for events, there was the need to allocate a total of 336 MB of

memory. This number does not scale with the size of the video, but instead with the amount of

events detected and their respective length. The performance was similar both using CPU and

GPU, given the test machine has a high end CPU and a low end GPU. Table 5.2 shows the real

run time on the test machine, along with the CPU time that was needed to complete the analysis.

The speedup is the time that would take a human to analyse the full length of the video, over the

time it took for the program to analyse it and split it into separate events (Equation 5.1).

Speedup =V ideo Length

Real time(5.1)

Video file Length (time) CPU time Real time SpeedupLong 3d20h09min 31h43min 6h5min 15x

Table 5.2: Event detector performance

Given the fairly equal time results obtained on both computational methods, we can assume

that using a couple of high end GPU side by side would result in the best performance for a given

budget. But due to compatibility issues introduced by the cross platform problem and the fact that

not every computer has a decent or compatible GPU, the CPU option was the one chosen.

45

5. Results

5.5 Height Measurement

The height measurement was tested using the video described in 5.2.2, and the result of

height measured and respective ground truth are presented in table 5.3.

Subject Board PersonHeight (meters) 0.91 1.72Measured Height (pixels) 184 341Measured Height (meters) – 1.68

Table 5.3: Height Measurement results

This shows an error of little more than 2%, even though small variations in where the user

considers to be the vertical projection of the top of the head on the floor, and where the top of the

head itself is, may result in higher measurement errors. These ambiguous areas are highlighted

with a red line in Figure 5.2. From these, we can estimate a worst case scenario for the height

deviation, which in this case reaches from 1.55m to 1.73m. Due to this, the step of selecting a

reference for re-projecting a plane and to input the top of the head and its projection on the floor

must be done with the best precision possible.

Figure 5.2: Measured height

5.6 Feature Detection

Four different frontal face classifiers were used, all of which ship with OpenCV. As it can be

observed by the comparison in table 5.4, alt tree has the best precision, but the one with the

best recall is the alt2 classifier. There is therefore a trade-off in this situation, where depending

46

5.6 Feature Detection

on the time available to go through all the false positives versus the thoroughness intended, the

user can choose one classifier or the other. Note that on a per frame analysis is difficult to evaluate

weather a frame contains a complete or frontal face, since it is hard to define at what angle a face

is considered to be frontal. Therefore, the recall values calculated in table 5.4 consider the sum

of the maximum number of faces detected for each individual the total amount of frontal faces

present in the video.

Classifier Sub. 1 Sub. 2 Faces Detected False Positives Precision Recalldefault 76 22 98 169 0.367 0.98alt 77 21 98 48 0.671 0.98alt2 78 21 99 64 0.607 0.99alt tree 71 19 90 3 0.968 0.90Maximum 78 22

Table 5.4: Face detection classifiers performance

Some of the faces detected in the test video used are shown in figure 5.3, some of which are

in the same frame. Furthermore, if we take a close look into the false positives, we can find some

resemblance to a human face.

Figure 5.3: Feature extraction results (using the alt tree classifier)

47

5. Results

5.7 GUI Validation

Since this project was developed with and for the crime scene investigators, it has been in both

the Danish and Portuguese team’s hands since the very first stages in has been in their hands,

helping in daily tasks.

48

6Conclusions

Contents6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

49

6. Conclusions

The original objective of this thesis was to create a program that would be able to extract

biometric features from surveillance videos. Given that the result of it consists on a program

with a graphical user interface that is able to streamline the video analysis process and integrate

several biometric feature detectors and extractors in one single tool, it is safe to conclude that, by

not giving biometry itself all the focus, but instead focusing on usability and integration of future

work, the project was a success and will continue to be.

As the result of the international cooperative effort to create this program and give it continuity,

a project team was created which will continue. To support it, and to further make public the

discussion of the algorithms to be integrated, the BioFoV’s GitHub [34] repository was created

which already hosts an unstable build of the program.

6.1 Future Work

This final section will present the vision for the future of this project along with some new useful

features which were not implemented yet and which interest for them arose.

6.1.1 Implementation of New Features

The project is constructed in a way that makes it fairly easy to implement new features on top

of the existing ones. To enhance the current feature set, these are some of the ideas that came

up during this first stage of the project, but didn’t have time to be implemented:

1. Face matching.

Based on the faces extracted, there should be a module that would analyse the faces set

of a certain video and strip it into groups, each of the subsets corresponding to a certain

person. This is currently being researched by another MSc’s student (Joao Satiro) who will

then integrate his algorithms in the program.

2. File format.

In order to enable saving the state of the program and resuming or reviewing the analysis

afterwards, a file format has to be defined. This was part of the project which was not

defined nor implemented and should be considered as a top priority for future developers.

3. Automatic silhouette extraction and matching.

As laid out in 2.3.1.H, silhouette extraction and matching is currently a manual process.

Optimizing it, by taking advantage of the Snapshots extracted by the event detector, and

automatically matching them, by minimizing the distance between the given silhouette and

the ones on the video, would not only drastically reduce the effort needed to perform this

task, but also improve the quality of the match since all the frames would be scrutinized and

50

6.1 Future Work

the minimal distance between silhouettes would be computed instead of hand made. This

distance can also be used for quantification of the quality of the result.

4. Camera calibration using the plumb-line constraint and minimal Hough entropy [13].

This method for calibrating a camera may prove to be very useful if the camera has been

destroyed, since it uses the existing lines in the image to perform the calibration.

5. 3D point cloud visualization.

With the latest advances in OpenCV, more precisely the Viz module implemented in OpenCV

2.4.9 it should be fairly easy to extend the current video player class of the UI and imple-

ment 3D visualization capability. This will only prove to be useful if 3D data is made avail-

able, either from the scene reconstruction module, or from inserting points and planes in the

environment manually, thus reconstructing the scene.

6. 3D scene reconstruction.

The implementation of the 3D point cloud visualization will enable the integration of a scanned

3D point cloud that can be obtained with a depth camera and a loop closing Simultaneous

Localization And Mapping (SLAM) algorithm after the crime has taken place. This can be

used for temporary crime scenes which are volatile and will disappear (short events such as

music festivals), creating a 3D model of the scene, which can be used for the investigation

after the crime scene is destroyed.

7. Report generation.

From the video report tools analysed, both Kinesense’s LE and Forevid have a reporting and

note taking feature useful to case documentation, where the user can create notes on the

various events detected or in specific frames, which can then be processed into a Portable

Document Format (PDF) file which can be easily be presented in a court hearing.

6.1.2 Improvements on Third Party Libraries

Since this project is highly dependent on third party libraries, it takes advantage of any im-

provements, either in performance or quality in results, that may be implemented in these. Cur-

rently OpenCV 3.0 beta includes a subset of Intel R© Integrated Performance Primitives (IPP) and

promises to bring considerable speed-ups to the library [37]. It also includes the license to re-

distribute applications that use IPP-accelerated OpenCV, allowing therefore the distribution of a

pre-build version of BioFoV with the benefits of IPP and keeping the installation process simple.

6.1.3 Support More Platforms

Currently only Microsoft Windows and GNU/Linux are supported by BioFoV. There are benefits

in also supporting at least Apple’s OSX. This was not possible during this first stage mostly due

51

6. Conclusions

to inexperience with this OS and the lack of a build and testing platform.

The creation of a computer specifically designed and optimized to run this program may be

well received by the police departments, once the platform has enough features, since it could

provide them with easy support and optimal performance. This may also be a way to finance the

project if there is the need to do so.

6.1.4 Integration Tests

In order to provide full set of tests that thoroughly evaluate code quality, the implementation

of integration, unit, acceptance and regression tests is a priority, with special emphasis on the

regression ones.

52

References

[1] E. C. in Science and Technology, “Integrating biometrics and forensics for the digital age,”

accessed: 2014-08-24. [Online]. Available: www.cost.eu/domains actions/ict/Actions/IC1106

[2] N. Lynnerup and P. Larsen, “Gait as evidence,” Biometrics, IET, vol. 3, no. 2, pp. 47–54, June

2014.

[3] A. Dantcheva, C. Velardo, A. D’Angelo, and J.-L. Dugelay, “Bag of soft biometrics for

person identification - New trends and challenges.” Multimedia Tools Appl., vol. 51, no. 2,

pp. 739–777, 2011. [Online]. Available: http://dblp.uni-trier.de/db/journals/mta/mta51.html#

DantchevaVDD11

[4] D. Cao, C. Chen, D. Adjeroh, and A. Ross, “Predicting Gender and Weight from Human

Metrology using a Copula Model,” 2012.

[5] A. K. Jain, R. Jin, and J.-E. Lee, “Tattoo Image Matching and Retrieval.” IEEE Computer,

vol. 45, no. 5, pp. 93–96, 2012. [Online]. Available: http://dblp.uni-trier.de/db/journals/

computer/computer45.html#JainJL12

[6] R. Bansal, P. Sehgal, and P. Bedi, “Minutiae Extraction from Fingerprint Images - a Review,”

CoRR, vol. abs/1201.1422, 2012. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/

corr1201.html#abs-1201-1422

[7] S. Minaee and A. Abdolrashidi, “On The Power of Joint Wavelet-DCT Features for

Multispectral Palmprint Recognition,” Sep. 2014. [Online]. Available: http://arxiv.org/abs/

1409.7818v1;http://arxiv.org/pdf/1409.7818v1

[8] S. M. Lajevardi, A. Arakala, S. Davis, and K. J. Horadam, “Hand vein authentication using

biometric graph matching,” IET Biometrics, vol. 3, no. 4, pp. 302–313, 2014.

[9] J. R. G. Neves and P. L. Correia, “Hand Veins Recognition System,” in VISAPP’14, 2014, pp.

122–129.

[10] S. S. Farfade, M. Saberian, and L.-J. Li, “Multi-view Face Detection Using Deep Convolutional

Neural Networks,” Feb. 2015. [Online]. Available: http://arxiv.org/abs/1502.02766v1;http:

//arxiv.org/pdf/1502.02766v1

53

www.cost.eu/domains_actions/ict/Actions/IC1106

http://dblp.uni-trier.de/db/journals/mta/mta51.html#DantchevaVDD11

http://dblp.uni-trier.de/db/journals/mta/mta51.html#DantchevaVDD11

http://dblp.uni-trier.de/db/journals/computer/computer45.html#JainJL12

http://dblp.uni-trier.de/db/journals/computer/computer45.html#JainJL12

http://dblp.uni-trier.de/db/journals/corr/corr1201.html#abs-1201-1422


http://arxiv.org/abs/1409.7818v1; http://arxiv.org/pdf/1409.7818v1




References

[11] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Clos-

ing the Gap to Human-Level Performance in Face Verification,” in

Conference on Computer Vision and Pattern Recognition (CVPR), 2014.

[12] Z. Zhang, “A flexible new technique for camera calibration.” IEEE Trans. Pattern

Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, 2000. [Online]. Available:

http://doi.ieeecomputersociety.org/10.1109/34.888718

[13] E. Rosten and R. Loveland, “Camera distortion self-calibration using the plumb-line

constraint and minimal hough entropy,” CoRR, vol. abs/0810.4426, 2008. [Online]. Available:


[14] C. S. McCamy, H. Marcus, and J. G. Davidson, “A color-rendition chart,” J. Appl. Photogr.

Eng., vol. 2, no. 3, pp. 95–99, Summer 1976.

[15] F. Project, “Forevid Forensic video analysis for everyone,” accessed: 2014-06-23. [Online].

Available: www.forevid.org

[16] A. Buades, B. Coll, and J.-M. Morel, “Non-Local Means Denoising,”

Image Processing On Line, vol. 1, 2011. [Online]. Available: http://dx.doi.org/10.5201/

ipol.2011.bcm nlm

[17] Wikipedia, “Deinterlaced vs interlaced image — Wikipedia, The Free Encyclopedia,” 2007,

[Online; accessed 26-March-2015]. [Online]. Available: https://commons.wikimedia.org/wiki/

File:Deinterlaced vs interlaced image.gif

[18] R. C. Gonzalez and R. E. Woods, Digital Image Processing (3rd Edition), 3rd ed. Prentice

Hall, Aug. 2007.

[19] D. Mitzel, T. Pock, T. Schoenemann, and D. Cremers, “Video Super Resolution Using Duality

Based TV-L1 Optical Flow.” in DAGM-Symposium, ser. Lecture Notes in Computer Science,

J. Denzler, G. Notni, and H. Suße, Eds., vol. 5748. Springer, 2009, pp. 432–441. [Online].

Available: http://dx.doi.org/10.1007/978-3-642-03798-6 44

[20] H. Intergraph Corporation, http://www.intergraph.com, December 2013.

[21] Kinesense, http://www.kinesense-vca.com, December 2013.

[22] PhotoModeler, “Close-range photogrammetry and image-based modelling,” accessed:

2014-06-25. [Online]. Available: www.photomodeler.com

[23] Z. Zivkovic, “Improved adaptive gaussian mixture model for background subtraction,” in

Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference

on, vol. 2, Aug 2004, pp. 28–31 Vol.2.

54

http://doi.ieeecomputersociety.org/10.1109/34.888718


www.forevid.org

http://dx.doi.org/10.5201/ipol.2011.bcm_nlm

http://dx.doi.org/10.5201/ipol.2011.bcm_nlm

https://commons.wikimedia.org/wiki/File:Deinterlaced_vs_interlaced_image.gif

https://commons.wikimedia.org/wiki/File:Deinterlaced_vs_interlaced_image.gif

http://dx.doi.org/10.1007/978-3-642-03798-6_44

www.photomodeler.com

References

[24] N. Friedman and S. Russell, “Image segmentation in video sequences: A probabilistic ap-

proach,” in Proceedings of the Thirteenth Conference Annual Conference on Uncertainty in

Artificial Intelligence (UAI-97). San Francisco, CA: Morgan Kaufmann, 1997, pp. 175–181.

[25] F. Meyer, “Color image segmentation,” in Image Processing and its Applications, 1992.,

International Conference on, 1992, pp. 303–306.

[26] GangInk, “GangInk,” accessed: 2015-03-26. [Online]. Available: http://gangink.com/index.

php?pr=KRAZY GETDOWN BOYS

[27] M. M. de Almeida, “Tattoo Segmentation Tool,” accessed: 2015-04-08. [Online]. Available:

https://github.com/BioFoV/tattoo-segmentation

[28] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,”

Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE

Computer Society Conference on, vol. 1, pp. I–511–I–518 vol.1, 2001.

[29] J. A. Belward, “An exponential version of filon’s rule,” J. Comput. Appl. Math., vol. 14,

no. 3, pp. 461–466, Mar. 1986. [Online]. Available: http://dx.doi.org/10.1016/0377-0427(86)

90081-6

[30] Qt Project, “Qt cross-platform application and UI framework,” accessed: 2015-03-28.

[Online]. Available: http://www.qt.io/

[31] itseez, “OpenCV open source computer vision and machine learning software library,”

accessed: 2014-06-24. [Online]. Available: opencv.org

[32] D. van Heesch, “Doxygen Generate documentation from source code,” accessed:

2014-06-24. [Online]. Available: www.stack.nl/∼dimitri/doxygen

[33] L. Torvalds, “GIT free and open source distributed version control system,” accessed:

2014-06-24. [Online]. Available: git-scm.com

[34] “Build software better, together.” accessed: 2014-08-29. [Online]. Available: github.com

[35] T. CI, “Travis CI,” accessed: 2015-03-02. [Online]. Available: https://travis-ci.org

[36] M. M. de Almeida, “Biofov,” accessed: 2015-02-10. [Online]. Available: https:

//github.com/BioFoV/BioFoV

[37] itseez, “OpenCV 3.0 beta,” accessed: 2015-03-01.

55

http://gangink.com/index.php?pr=KRAZY_GETDOWN_BOYS

http://gangink.com/index.php?pr=KRAZY_GETDOWN_BOYS

https://github.com/BioFoV/tattoo-segmentation

http://dx.doi.org/10.1016/0377-0427(86)90081-6

http://dx.doi.org/10.1016/0377-0427(86)90081-6

http://www.qt.io/

opencv.org

www.stack.nl/~dimitri/doxygen

git-scm.com

github.com

https://travis-ci.org

https://github.com/BioFoV/BioFoV

https://github.com/BioFoV/BioFoV

Biometric Identification from Forensic Video Surveillance ... · indiv´ıduos que neles ﬁgurem....

Documents

Transcript of Biometric Identification from Forensic Video Surveillance ... · indiv´ıduos que neles ﬁgurem....