Biometric Identification from Forensic Video Surveillance ... · indiv´ıduos que neles figurem....
Transcript of Biometric Identification from Forensic Video Surveillance ... · indiv´ıduos que neles figurem....
Biometric Identification from Forensic VideoSurveillance Evidence
Miguel Reis Moitinho de Almeida
Thesis to obtain the Master of Science Degree inEngenharia Electrotecnica e de Computadores
Supervisor: Doctor Paulo Luıs Serras Lobato Correia
Examination Committee
Chairperson: Doctor Jose Eduardo Charters Ribeiro da Cunha SanguinoSupervisor: Doctor Paulo Luıs Serras Lobato Correia
Members of the Committee: Doctor Joao Miguel da Costa MagalhaesCarlos Filipe Bento Gregorio
May 2015
Acknowledgments
Thank you to:
• The European Cooperation in Science and Technology (COST) for providing me with the
opportunity to learn from the best;
• Peter Kastmand Larsen for his good humour, scientific insight and shared experience;
• Laboratorio de Polıcia Cientıfica, mainly to Carlos Gregorio, Francisco Calado, Ana Cristina
Correia and Gisela Rosa for their precious insight;
• Pedro Tomas for publicly providing the template for this dissertation and thus saving me a
lot of time on formatting;
• All OpenCV contributors who contributed to such a great tool;
• J for making dinner and tea, while cheering me up, and of course for putting up with me;
• My father, not for nagging me one too many times, but for helping me review this document
and for putting up with cross compilation and linking questions;
• David Pereira for the painstaking review and morale boosting;
• Least but not last, to Paulo Lobato Correia for not giving up on me, putting up with me and
giving all the tools to successfully conclude this thesis.
Abstract
Considering that the tools available for the forensic crime scene investigators responsible for
video and biometric analysis are so expensive or simply not designed for the job, and since so
much academic work is produced in these areas, the opportunity to join both worlds arose. With
the objective of accomplishing this approach, this work proposes the creation of a unified forensic
video and biometric data extraction platform.
With this in mind, the objective of this project is to design and implement a platform capable of
integrating both video and biometric extraction capabilities. In order to exemplify the capabilities
of this platform, some of the main day to day tools used by the target users were identified and
implemented. These same needs were identified and evaluated with cooperation of the Scientific
Police Lab of Polıcia Judiciaria and the Unit of Forensic Anthropology from Faculty of Health
Sciences of the University of Copenhagen, the latter one thanks to the support of the European
Cooperation in Science and Technology.
The analysis of the implemented tools, the description of the ones currently being imple-
mented, and the targeting of the ones that should be implemented in the near future are pre-
sented. This project intends to create cooperation ties between universities and forensic investi-
gators, not only from the labs that took part on it, but also with others from around the world.
Keywords
Biometry, Video Analysis, Forensic Analysis, Motion Detection, Feature Extraction, BioFoV
iii
Resumo
Considerando que as ferramentas informaticas disponıveis para os investigadores forenses
de video de vigilancia serem dispendiosas ou nao estarem talhadas para a analise forense de
vıdeo, e que tanto trabalho e produzido pela academia nesta area, levantou-se a necessidade de
juntar estas duas vertentes. Com o objectivo de proceder a tal aproximacao, este trabalho propoe
a criacao de uma plataforma de analise forense de vıdeo e extraccao de dados biometricos de
indivıduos que neles figurem.
Para tal, este projecto tem como objectivo projectar e implementar uma plataforma que seja
capaz de integrar tanto a capacidade de processar vıdeos como a de extrair caracterısticas
biometricas dos mesmos. Para exemplificar as capacidades desta plataforma, algumas das prin-
cipais ferramentas necessarias para o dia a dia dos seus utilizadores alvo foram identificadas e
implementadas. Essas mesmas necessidades foram identificadas e avaliadas em conjunto com
o Laboratorio de Polıcia Cientıfica da Polıcia Judiciaria e com a Unidade de Antropologia Forense
da Faculdade de Ciencias da Saude da Universidade de Copenhaga, esta ultima gracas ao apoio
da Cooperacao Europeia em Ciencias e Tecnologia.
E feita a avaliacao das ferramentas ja implementadas, a descricao das que estao a ser imple-
mentadas, e o delineamento das que deverao ser integradas num futuro proximo. Este projecto
pretende fomentar a cooperacao entre universidades e investigadores forense, nao so dos labo-
ratorios que nele participaram, como outros de todo o mundo.
Palavras Chave
Biometria, Analise de Vıdeo, Analise Forense, Deteccao de Movimento, Extraccao de Carac-
terısticas, BioFoV
v
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 State of the art 5
2.1 Video Analysis Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Laboratorio de Polıcia Cientıfica da Polıcia Judiciaria (Portuguese Criminal
Police Laboratory) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Unit of Forensic Anthropology – Faculty of Health Sciences of the University
of Copenhagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Evidence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Impossibility of Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Subject Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Likelihood Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Biometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Soft Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1.A Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1.B Ethnicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1.C Race . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1.D Clothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1.E Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1.F Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1.G Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1.H Silhouette Matching . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1.I Tattoos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Hard Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2.A Fingerprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
vii
Contents
2.3.2.B Palmprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2.C Hand Veins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2.D Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Camera Calibration With Camera Access . . . . . . . . . . . . . . . . . . . 13
2.4.2 Camera Calibration Without Camera Access . . . . . . . . . . . . . . . . . 13
2.4.3 Stereo Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.4 Colour Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.1 Single Camera Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.2 Stereo Camera Pair Photogrammetry . . . . . . . . . . . . . . . . . . . . . 14
2.6 Video Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.1 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.2 Deinterlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.3 Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.4 Superresolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 Evidence Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Software Bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8.1 Calibration and Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . 19
2.8.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.8.3 Forevid (Video Processing and Documentation) . . . . . . . . . . . . . . . . 19
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Proposed Video Analysis System 21
3.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 Biometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Noisy Area Exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Biometric Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Tattoos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.2 Height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.3 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
viii
Contents
4 Implementation 29
4.1 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Modularity of the Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Data Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3.1 Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.2 Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.3 Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.4 Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.5 Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.6 Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.7 Individual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 User Interface Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.1 Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4.2 Drawable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 GUI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Height Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.7 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.8 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.9 Other Useful Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.9.1 Export Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.9.2 Print Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.10 Computational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.11 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.11.1 Technical Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.11.2 User Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.12 Open Sourcing the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.13 Build System and Continuous Integration . . . . . . . . . . . . . . . . . . . . . . . 40
4.13.1 Make Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.13.2 Continuous Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.13.3 Cross Compilation Support . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.13.4 Static Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Results 41
5.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.1 Long Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 Short Video for Height Measurement . . . . . . . . . . . . . . . . . . . . . . 42
5.2.3 Videos for Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
ix
Contents
5.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5 Height Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.6 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.7 GUI Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Conclusions 49
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.1.1 Implementation of New Features . . . . . . . . . . . . . . . . . . . . . . . . 50
6.1.2 Improvements on Third Party Libraries . . . . . . . . . . . . . . . . . . . . . 51
6.1.3 Support More Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.4 Integration Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
x
List of Figures
2.1 Silhouette superimposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Macbeth ColorChecker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Single camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Stereo camera pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Interlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Event classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Noisy areas rejection data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Tattoo segmentation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Planar height measurement with reference . . . . . . . . . . . . . . . . . . . . . . 27
3.6 Feature detection flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 High level design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Data classes relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Background subtraction example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Event classification dialogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.6 Video separated in events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.7 Camera calibration dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1 Calibration results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Measured height . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Feature extraction results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
xi
List of Figures
xii
List of Tables
4.1 Video codec comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1 Calibration precision test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Event detector performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Height Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Face detection classifiers performance . . . . . . . . . . . . . . . . . . . . . . . . . 47
xiii
List of Tables
xiv
List of Algorithms
3.1 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
xv
List of Algorithms
xvi
Acronyms
API Application Programming Interface. 30
BioFoV Biometric Forensic Video analyzer. 30, 37, 39, 40, 43, 50–52
CI Continuous Integration. 2, 31, 40
COST European Cooperation in Science and Technology. 7
CPU Central Processing Unit. 30, 36, 39, 45
CUDA Compute Unified Device Architecture. 36, 39
FOV Field Of View. 42, 44
FPS Frames Per Second. 42
GPU Graphics Processing Unit. 30, 39, 45
GUI Graphical User Interface. 31, 36
HFYU Huffman Lossless Codec. 38, 39
IPP Intel R© Integrated Performance Primitives. 51
LPC Laboratorio de Polıcia Cientıfica. 3, 6, 12, 19, 23, 25, 28, 42
MR Merge Request. 40
MSc Master of Science. 2, 19, 30, 50
OpenCL Open Computing Language. 36, 39
OpenCV Open Source Computer Vision. 13, 30, 36, 37, 40, 43, 46, 51
OS Operating System. 40, 52
PDF Portable Document Format. 51
xvii
Acronyms
PJ Polıcia Judiciaria. 6
SLAM Simultaneous Localization And Mapping. 51
UI User Interface. 30, 34, 51
XVID XVID MPEG-4. 38, 39
xviii
1Introduction
Contents1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1
1. Introduction
In this section a small introduction will be given to this project, followed by a brief review of the
state of the art on every scientific area explored. The following two chapters detail on both the
design and implementation of the program and the creation of the associated project are provided.
In the final chapters, results obtained from the program using real data will be presented, along
with the evaluation of these results, followed by some conclusions on what was done, and what
there is still left to do in the context of the proposed project.
1.1 Motivation
Given the lack of flexible tools to perform forensic video analysis in a crime scene investigation
context and observing that there is real world interest on having such a tool, it was decided to use
this Master of Science (MSc) thesis to create a program that would address this need.
The intent was not to write something that will be forgotten or not used after this work is
completed, but instead to provide an extensible tool that may serve as a one stop shop for third
party algorithms to be implemented on, and used as an integrated program that can bring other
student’s projects closer to real world usage.
1.2 Objectives
The objective of this thesis is to provide an extensible software platform that will be a useful
tool for forensic investigators, by fitting their needs and thereby minimizing the required workload.
Furthermore it will allow future students, members of the scientific community and members of the
open source community in general, to see their algorithms implemented in a real world application.
In order to get the project some initial traction, most of the effort of this thesis was focused
into automating a set of usually time consuming tasks in the video analysis process. This way,
less effort is required from the investigator to perform tasks that were once laborious, thus leaving
more time to focus on the aspects that require the expert’s insight, contributing to an overall quality
increase of the analysis.
After the initial traction was achieved, the focus turned to investigating which soft biometric
measurements can be extracted from surveillance videos typically available from crime scenes to
aid in the recognition task.
1.3 Main Contributions
The output of this thesis is not a program nor an algorithm, but instead a project that aims
to change the paradigm of the software tools used in police investigations. All the development
tools, Continuous Integration (CI) and Makefiles were designed to allow the project to be easily
tinkered with, and ensure its quality and continuity.
2
1.4 Dissertation Outline
Until now most tools used for criminal video analysis and investigation were not designed for
the job, so investigators need to use commercial video and image editing tools, jumping from one
to the other without any kind of data integration. This was observed both at Dr. Peter Kastmand
Larsen’s lab in the Unit of Forensic Anthropology from the Faculty of Health Sciences of the Uni-
versity of Copenhagen, and at the Laboratorio de Polıcia Cientıfica (LPC) (Portuguese Scientific
Police Lab), who have both used and tested the program. This thesis aims to provide a useful tool
for these specific teams, and others that may decide to use it.
This project will also allow future students to see their thesis and projects integrated in a real
project with real world implications, improving the judicial system by providing them with bleeding
edge algorithms and tools to streamline their work.
1.4 Dissertation Outline
After this introduction the current state of the art of video analysis techniques and biometrics
used for identification in a crime scene investigation environment will be laid out. This is followed
by the proposed solution to fill in the gap of specially crafted software for biometric analysis and
the implementation details of such project and performance/quality results. This document ends
by presenting suggestions for future work and project continuity, together with the conclusions
reached from the preparation of this thesis.
3
1. Introduction
4
2State of the art
Contents2.1 Video Analysis Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Evidence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Biometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Photogrammetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 Video Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.7 Evidence Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.8 Software Bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5
2. State of the art
This chapter starts by describing the current process of video analysis for both forensic teams
that collaborated in this project. It explores the biometric features that were considered and anal-
ysed in this work. Finally some of the most used video analysis software solutions available in the
market are analysed in order to pinpoint which are their strongest features and where they lack
capabilities that would be useful to an investigator.
2.1 Video Analysis Methodology
Since the target users for this project are the crime scene investigators, the best way to fit their
need is by understanding how they currently work with the tools currently available and how their
daily efforts can be optimized. For this reason, the following two investigation facilities shared
their experience, working methods, and needs:
• Laboratorio de Polıcia Cientıfica (LPC) da Polıcia Judiciaria (PJ) – Portuguese criminal po-
lice laboratory;
• Unit of Forensic Anthropology from the Faculty of Health Sciences of the University of
Copenhagen.
2.1.1 Laboratorio de Polıcia Cientıfica da Polıcia Judiciaria (PortugueseCriminal Police Laboratory)
The cooperation with LPC is the main objective of this project, with the objective of providing
them useful tools for data analysis. They provided an essential dataset of real crime cases with
the respective surveillance videos and analysis reports that, for confidentiality reasons, will not be
disclosed or used as test cases in this dissertation.
The current video analysis methodology used by LPC is very laborious, time consuming and
makes use of methods which are not ideal for the task. Techniques observed from the reports
provided for analysis and procedure descriptions of the investigators included:
• Watching an whole overnight or weekend footage of a surveillance camera in real time in
order to pinpoint the time of the crime, consuming a lot of man hours and is one of the main
bottlenecks of video analysis.
• Non ideal image editing techniques for superimposing suspect’s photos with frames from
the video with the perpetrator, which is mainly due to the lack of tools to correctly cross
relate biometric data.
One of the biggest hurdles described by the investigators was the fact that it is difficult to get
access to untampered crime scene in time. This problem will not be tackled in this project due to
the fact it is not a technical issue, but instead a legal one.
6
2.2 Evidence Analysis
2.1.2 Unit of Forensic Anthropology – Faculty of Health Sciences of theUniversity of Copenhagen
Integrated in the European Cooperation in Science and Technology (COST) Action IC 1106 [1],
the opportunity to go to Copenhagen, Denmark, for one week and a half was presented, with the
objective to get acquainted with the methods and needs of experts who have been successfully
presenting soft biometric data obtained from surveillance videos in court.
Peter Larsen was the expert who most shared his working methods. On the cases observed,
gait analysis by observation and silhouette superimposition were the most used techniques, along
with standing pose comparison. Other cases also showed the usage of a suite of software called
PhotoModeler for height estimation and other measurements.
2.2 Evidence Analysis
When analysing any case, there is the need to cross reference all the evidence data to reach
a meaningful conclusion. In the forensic community there is a proposed scale for the levels of
evidential weight, which helps the investigators to describe in court the outcome of their investi-
gation [2]. This scale is defined as follows:
Identification When it is certain that the suspect has the same identity as the perpetrator;
Strongly Indicated When several traits are found to point towards suspect and perpetrator hav-
ing congruent identity;
Indicated When few traits are found to point towards suspect and perpetrator having congruent
identity;
Cannot Be Excluded When it is not possible to perform a comparison between suspect and
perpetrator due to the (low) quality of the surveillance recordings;
Very Little Speaks In Favour Of When there is very little reason to believe that suspect and
perpetrator have congruent identity;
Elimination When it is certain that perpetrator and suspect cannot have congruent identity.
In the following sections the impossibility of identification, how biometric evidence can easily
be used for subject exclusion, and the difficulty of cross referencing evidence in order to give
meaningful likelihood ratios are discussed.
2.2.1 Impossibility of Identification
One of the most valuable lessons learnt from the COST Action contact with the Unit of Forensic
Anthropology from the Faculty of Health Sciences of the University of Copenhagen was that, no
7
2. State of the art
matter how good a biometric feature is, it can never lead to identification on its own. It is essential
to keep this in mind when it comes to compile the data obtained from the forensic analysis to
reach a conclusion.
A good example is the DNA itself, which is taken as the most identifying feature of any living
being. If a suspect’s DNA has been detected as being a match to the culprit’s, it is impossible to
state with full certainty that the suspect and the culprit are one and the same, since the suspect
may have a twin, even if he doesn’t know about his existence. To overcome this limitation and
draw rightful conclusions from the evidence gathered, it is essential to mix hard biometrics with
other kind of evidence, such as soft biometrics or any other sort of proof.
On the example of the lost twin, soft biometrics can weigh a lot on the final decision. A simple
scar, a tattoo or a person’s gait, may not identify a person, but help postulating the thesis that the
suspect is, or is not, the culprit.
2.2.2 Subject Exclusion
Even though it is not possible to identify anybody with a biometric measurement on its own, that
same biometric measurement can be used to exclude with absolute certainty another suspect.
Even though this may sound as something that can only be used by the defence of to exonerate
the suspect, it must be seen as a way to narrow down the suspect list. This is hard to do if the
investigators themselves are acquainted with the case and are convinced that a suspect is the
perpetrator, leading to possible involuntary extrapolations.
2.2.3 Likelihood Ratios
The gathering of bare biometric data is not enough to extract a meaningful conclusion about
a suspect. Therefore, there is the need to cross relate them and output a likelihood ratio that can
represent the certainty of the data gathered belong to the suspect.
2.3 Biometry
Biometrics are measurements related to human traits or features. There are two categories of
biometric features, hard and soft. Both these categories are explored in this section in order to
pinpoint which are suitable to be used in a video forensic environment.
2.3.1 Soft Biometrics
Soft biometrics are a specific type of biometrics related to how people usually distinguish each
other, from physical traits such as hair colour, height or tattoos to behavioural characteristics
like gait or posture. The article “Bag of Soft Biometrics for Person Identification” [3] makes an
extensive survey of soft biometric traits and the relation between these.
8
2.3 Biometry
2.3.1.A Gender
Gender is the classic soft biometric. It is one of the characteristics humans use to distinguish
each other, and one that almost all the population has.
2.3.1.B Ethnicity
By definition, individuals are of the same ethnicity if they identify each other as belonging to
the same cultural group, which usually implies having the same beliefs and daily habits, and these
same individuals may or may not share a common ancestor.
This biometric trait is useful in the forensic environment if the ethnicity in case has any non
physical trait such as characteristic clothing, characteristic haircuts or body paintings such as the
bindi. On the other hand, it is a specially delicate feature, since some ethnicities are associated
with certain behaviours, which may lead to making assumptions that are not based on evidence,
but instead a preconceived stereotype.
2.3.1.C Race
Race can be defined as a set of physical characteristics shared by a set of individuals with a
common ancestor. This feature is usually associated with other biometric features such as the
position of the eyes, shape of the nose and ears, height, skin tone, among others.
In the forensic environment, race can be more useful than ethnicity since, even if the perpe-
trators of a crime are in plain black clothes for example, they cannot change their physical traits.
As with the ethnicity, race is a biometric feature which is associated with certain stereotypes, and
as all evidence, may lead to erroneous conclusions if it is not analysed objectively.
2.3.1.D Clothing
Since nowadays most people use clothes from big brands that sell in large quantities, clothing
is rarely a unique trait. Therefore, identifying a person just because they are using a specific shirt
is out of question, but giving a high likelihood to a specific set of shirt, coat, shoes and jeans can
help on the case.
There is the problem of the perpetrator giving away his clothes to a homeless, who can be
identified as the culprit, slowing the case analysis or accusing the wrong individual. This can be
avoided by giving the a low weight on the likelihood ratio estimation, and by using more than just
the clothing as evidence. Clothing is a very useful biometric feature, since, if there were witnesses
to the crime, it does not need to be recorded by a surveillance camera to get a partial description
of it, since it is something humans tend to notice if they don’t know somebody.
9
2. State of the art
2.3.1.E Height
A person’s height is difficult to measure precisely, since even in ideal conditions the measure-
ments can end up with very different values of the same individual’s height. This is caused by five
main factors:
• The first and most significant factor is pose, which is very difficult to correct and causes high
errors on height measurements since a slight relaxation of the posture can easily reduce
one’s height in over 5 centimetres.
• Gait can also hinder the measurement process since the head height does not remain con-
stant during a whole walk cycle. The fact that both legs may be apart from each other, the
vertical projection of the top of the head on the floor can be difficult to estimate and therefore
induce error.
• Shoe soles, which can be of unknown thickness, can cause significant errors in height mea-
surement. If the shoes have been previously identified (2.3.1.D), their contribution can be
compensated in the total measurement.
• Hair or head wear can complicate the process of pinpointing the top of the head, possibly
introducing error in the measured height.
• An individual’s spine is compressed during the day and expands during the night, this causes
a person to “shrink” from the morning to the evening between 1 to 2 centimetres.
Event though it is hardly an identifying feature, height can be used in the forensic environment
to exclude a subject or contribute to the certainty of a suspect being the perpetrator.
2.3.1.F Weight
A very interesting feature that was also considered in the beginning was weight estimation,
mostly based on [4].
Due to the fact that the studies are not yet sturdy enough, the validity of the data obtained
through the analysis of the videos could not be of practical use since it would not be valid in court.
On the dataset provided by the Portuguese Police, most individuals are dressed with more
than one layer of clothing, making it very hard to make waist measurements and therefore give
any sort of rightful estimation.
2.3.1.G Gait
Gait is sometimes an identifying feature. An example of this was a case analysed by Dr. Peter
Kastmand Larsen, from the University of Copenhagen, on which the perpetrator had an injury
caused by a motorbike accident which left him limping in a very particular way.
10
2.3 Biometry
Gait recognition was also a considered feature to be used as identifier. Due to very low frame
rate and bad camera angles (most security footage is filmed from the top), it is very difficult to get
a good idea of an individual’s gait. This is worsened by the erratic movement and tense posture
most individuals take while committing a crime.
2.3.1.H Silhouette Matching
When an accurate measurement of the perpetrator is not possible and there are suspects, a
method used in Denmark is silhouette matching, illustrated in figure 2.1.
Much like gait, it tries to match posture and body volume, but only in two dimensions, unlike
gait, which takes variations in time into consideration too.
The silhouette extraction and superimposition method is currently used in Denmark, and it is
done manually, making it very laborious to extract and match more than a couple of promising
frames from a video.
Figure 2.1: Silhouette superimposition
2.3.1.I Tattoos
Tattoos are a very interesting and quite unexplored area of soft biometrics which has only
very recently been researched as a soft biometric [5]. They are very useful to group and cross
reference individuals, mostly gangsters.
Furthermore, some gangs, such as the Yakuza, do not allow for their members to show their
tattoos in public at all, making this biometric feature useless on surveillance video. Restricting its
usefulness for the processing of a subject in a police precinct in order to cross reference him with
a know database.
2.3.2 Hard Biometrics
Hard biometrics are non behavioural biometrics and traditionally unmodifiable traits.
11
2. State of the art
2.3.2.A Fingerprint
The fingerprint is the traditional and most studied hard biometric feature. The computational
analysis is based on minutia matching and it has been thoroughly studied [6]. Minutias are fea-
tures of the fingerprint that can be easily recognized. The main three ones are ridge endings,
bifurcations and dots. In the specific case of forensics of surveillance video, fingerprints have
very little to no use cases. Therefore they are not going to be further applied in this project.
2.3.2.B Palmprint
Much like fingerprints, palmprints have been used traditionally among the forensic commu-
nity and are a proven technology widely used with very good results [7]. And again, much like
fingerprints they do not have many uses in surveillance forensics.
2.3.2.C Hand Veins
Palm vein pattern recognition is one of the topics currently under heavy research in biometry
[8]. It provides hard proof which may become a good biometric trait namely for authentication.
Besides providing a hard to copy pattern, it is extremely hard to fake since there is the need to
simulate the blood flowing through the veins and arteries. This approach can provide really good
results with limited budget, as shown in [9].
2.3.2.D Face
The face is a proven biometric trait that is widely used for recognizing an individual, not only in
human to human interactions, but also on human to machine interactions. Recently it has been a
very active field of study, making use of deep neural networks in order to perform multi-view face
detection [10] [11]. Face proportions are a hard biometric feature that is used by the police for
identification, but the method used for identification is mostly only human judgement.
2.4 Camera Calibration
From the cases presented by the LPC, there were two different common cases which required
camera calibration. In the first case, the camera remained intact and untouched after the crime
took place and it is possible to calibrate it. While in the second, the camera was either destroyed
in the process of the crime, tinkered with afterwards, or it is not accessible due to bureaucratic
reasons. For simplicity reasons, and considering the characteristics of the dataset provided by the
Portuguese police, all the cameras considered in this thesis are static cameras with fixed zoom.
12
2.4 Camera Calibration
2.4.1 Camera Calibration With Camera Access
In the first situation it is easy to go to the crime scene and calibrate the camera with precision
using a calibration pattern, enabling a good calibration with low errors. One of the existing meth-
ods to achieve this is using multiple orientations of the same planar pattern [12]. This process
of camera calibration does not require specialized hardware, since all that is needed is a planar
pattern, easily printed in any office printer.
2.4.2 Camera Calibration Without Camera Access
Even thought the first situation is the ideal one, the contacts with the Portuguese police indicate
that it is generally difficult to have access to the cameras that filmed the crime. This is mostly due
to the security companies, who installed the cameras, not wanting third parties to be handling their
systems. The reluctance towards allowing access to the surveillance systems in order to perform
crime scene analysis seems to be very difficult to change in Portugal. Another case where this
happens is in temporary crime scenes, for example in music festivals or other places that are
scheduled to be dismantled.
This situation requires the camera to be calibrated using exclusively the videos which were
submitted for analysis. A method considered for this project was the camera calibration using the
plumb-line constraint and minimal Hough entropy described in [13], but due to the fact that it may
give bad results on low quality videos, this algorithm was not included.
2.4.3 Stereo Calibration
If two or more cameras are filming the same crime scene (from two different points of view),
it is possible to calibrate a stereo configuration with them, allowing error correction, 3D modelling
of the recorded scene, and virtual camera position and angle changes. Currently, Open Source
Computer Vision (OpenCV) implements functions to perform this action.
2.4.4 Colour Calibration
Colour is a very important part of an image, and consequently of a video. It has special
importance since it may be crucial for the analysis to determine the clothing, hair or skin colour of
a suspect. Given that the lighting of the environment is the same in the evidence as it is at the time
of calibration, it is possible to correct the colour distortion that may be introduced by the camera.
With the use of a colour calibration pattern, such as the Macbeth ColorChecker [14] seen on fig.
2.2, it is possible to correct the colour of the camera.
13
2. State of the art
Figure 2.2: Macbeth ColorChecker
2.5 Photogrammetry
After the cameras have been calibrated, it is possible to perform measurements of objects
or subjects in the scene, making use of the videos or images recorded. There are two distinct
scenarios that may arise which allow different ways of measurement.
2.5.1 Single Camera Photogrammetry
Using a simple static calibrated camera and making use of known landmarks in the scene,
it is possible to reconstruct a scene in three dimensions. Figure 2.3 shows the reconstruction
of a perpetrator walking into a store and stepping next to an exhibitor which can be used as a
reference to estimate the height of the perpetrator during the walk cycle. Since there is only one
camera filming the scene, it is not possible to get a depth measurements from the camera, thus
making preferable the use of the technique presented next.
2.5.2 Stereo Camera Pair Photogrammetry
Stereo photogrammetry is the technique of obtaining depth perception using two cameras,
similarly to the way human eyes work. By using two static security cameras, which do not need
to be identical, it is possible to perform measurements between two static points that show up on
both cameras. Figure 2.4 shows an example of a camera disposition that allows for the measure-
ment of the perpetrator. If the cameras are synced, it is even possible to make measurements on
moving objects by taking the two frames correspondent to a given instant (one from each camera)
and compare them.
14
2.5 Photogrammetry
Figure 2.3: Single camera
(a) Image from camera 1 (b) Image from camera 2
Figure 2.4: Stereo camera pair
15
2. State of the art
2.6 Video Filtering
Video filtering is a very important step that allows a better analysis. Its objective is to enhance
the quality of the video and therefore the biometric features to be analysed afterwards. The tool
which is most used by the police investigators who participated in this paper for video processing
is called Forevid [15] and will be further presented in 2.8.3.
2.6.1 Denoising
Video denoising, as the name suggests, is the removal of noise from a video. In the forensic
environment this noise is usually present on videos recovered from old analogue systems where
the tapes, may be worn out, or in low light situations, where the gain of the sensor amplifies noise.
An example of the results of non-local means denoising [16] is presented in figure 2.5.
(a) Original image (b) Image with noise (c) Denoised image
Figure 2.5: Denoising example [16]
2.6.2 Deinterlacing
Before defining deinterlacing, one needs to define what interlacing is. Interlacing is the tech-
nique used in video recording of storing only half the lines for each frame, first the odd ones,
then on the next frame the even ones, and so on. This way static video suffers no change, but
sudden horizontal movements have clear noise in visible horizontal lines. The figure 2.6a shows
a frame as captured by a camera, with interlacing. Deinterlacing uses adjacent frames in time
and reconstructs the one that happened in between them, but was not recorded. By combining
the two adjacent frames it is possible to reduce the noise introduced by the interlaced recording.
The resulting frame is clearer as shown in figure 2.6b.
2.6.3 Sharpening
Sharpening consists on enhancing edges in order to make the details of an image more clear
and easily distinguishable. Figure 2.7 shows two photographies of the moon, being 2.7a the
original and 2.7b the image resulting from the sharpening process.
16
2.6 Video Filtering
(a) Original interlaced image (b) Deinterlaced image
Figure 2.6: Interlacing example [17]
(a) Original image (b) Sharpened image
Figure 2.7: Sharpening example [18]
17
2. State of the art
The most used method to achieve this result is called unsharp masking, and it consists on
taking the original image, blur and invert it, and do a weighted subtraction with the original image.
This is defined by equation 2.1 where α is the weight of the subtraction and black the maximum
value for a channel (which in an 8 bit greyscale image is 0).
Iunsharpened = I − α(black − blur(I)), 0 ≤ α < 1 (2.1)
2.6.4 Superresolution
Superresolution is a technique used to enhance the original resolution of a video frame by
taking into consideration the neighbour frames which may provide more information about the
object in analysis [19]. This technique is specially useful for enhancing details on videos with a
reasonably high framerate or in slow moving objects or subjects. This is due to the fact that, in
order to get the best results, the image detail should be in the same orientation in all the frames
used.
2.7 Evidence Documentation
Evidence gathering and documentation is an important part of an investigator’s work, since it
bundles all the evidence and the final result of the evidence analysis into a document. System-
atic approaches and strict documentation protocols are essential in order to produce reproducible
data in case the images have suffered transformations during the analysis process (such as en-
hancements). Forevid, which will be further presented in 2.8.3, is a good video documentation
tool, that allows users to pinpoint specific frames on the videos of the crime and comment on
them, bundling them all in a single document ready to be appended to a case and presented in
court as evidence.
2.8 Software Bundles
Both the Portuguese criminal police and the Danish forensic experts currently work with var-
ious software platform which are either expensive or not tailored for surveillance video analysis.
They are mostly divided into three categories: Analysis; Processing and Documentation. There
are some software bundles available like Video Analyst R©from Intergraph [20], and Kinesense’s
forensic video retrieval, search, analysis and reporting tool [21] that can be used in forensic video
analysis. The latter one’s Law Enforcement (LE) suite can be compared to the project described
on this dissertation in terms of functionality.
18
2.9 Summary
2.8.1 Calibration and Photogrammetry
Camera calibration is an essential task that must be performed correctly to obtain the best
measurements possible, either on 2D, 3D or colour space. The Danish experts that collaborated
in this project and the LPC currently use a program called PhotoModeler [22] to perform both the
calibration and spatial measurements.
2.8.2 Analysis
Currently, in the forensic community, bare video analysis is done by a human. Besides being
prone to error, this has an inherent severe problem, the manpower required to perform this task.
An easy to operate motion detection method needs to separate a long video into several small
events on which motion is detected, easing the analysis, reducing the man time needed to run
through several hours of video and the possible errors induced by that repetitive task.
2.8.3 Forevid (Video Processing and Documentation)
Forevid is an opensource software tool for video enhancement and documentation developed
by Sami Hautamaki for his MSc’s dissertation in cooperation with the Forensic laboratory of the
National Bureau of Investigation in Finland. Development stopped in late 2012 but it is still used
by police departments all over Europe. The project proposed in this thesis shares the openness
and target community of Forevid, but has a different objective since the purpose of this project is
biometric analysis and not video processing.
2.9 Summary
Although there are many software tools and algorithms available for video and biometric data
processing, they are either not publicly available due to copyright, or not implemented in any
software tool which may be used by an investigator rendering them useless for the investigation
process. The proposed platform aims to fill this gap by compiling selected algorithms and tools
into a single one specially crafted for the forensic investigator.
19
2. State of the art
20
3Proposed Video Analysis System
Contents3.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2 Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.3 Biometric Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
21
3. Proposed Video Analysis System
Given the problems and technologies described in the previous section, the goal of this project
is to design a software solution that can integrate both video processing capabilities and biometric
analysis from those same videos, solving the current problems that the investigators face while
being flexible enough to be extendible so that it can solve other future issues.
In this chapter the developed solution will be described in a functional way, starting by stating
the intended workflow and proceeding to the video processing algorithms and biometric mea-
surements analysed during the development of this project. Chapter 4 will then focus on the
implementation itself.
3.1 Workflow
This project intends to streamline the analysis of a forensic video. In order to achieve this, the
first thing to do is to define the user’s intended workflow. The workflow is separated in two major
sections, the first being the video processing, and the second the biometric analysis (figure 3.1).
Figure 3.1: Workflow
3.1.1 Video Processing
The video processing part has as two main objectives, correcting distortions and segment the
video in order for the investigator to only need to focus on the interesting parts of the video. This
process is separated into several steps:
1. Video loading;
2. Camera correction:
(a) Camera calibration;
(b) Orientation correction;
22
3.2 Video Processing
3. Video segmentation:
(a) Noisy areas exclusion;
(b) Event detection (motion detection).
From all of these steps only the first one that is mandatory since the video may already have
been previously corrected for distortion and/or segmented to a specific period of interest.
3.1.2 Biometric Analysis
After processing the video, the actual biometric analysis of the perpetrators can be performed.
There is no predefined workflow for every biometric feature since each one may require very
distinct user interactions and generate very different outputs, which result in unique workflows for
each biometric trait. Each biometric features considered as candidate for integration in this project
will be analysed in section 3.3.
3.2 Video Processing
Processing the video is the first step prior to analysis. This allows for the minimization of
the video that needs further analysis, minimizing also the time needed for analysis, and for the
correction of the camera distortion and orientation, improving the results of the biometric analysis.
3.2.1 Event Detection
The event detection was the first feature to be implemented in the program since it is the one
which LPC has the most need for. Its objective is to drastically reduce the amount of time needed
by the investigator to analyse a long video. The event detection algorithm consists actually in two
separate algorithms. The first being the one used to calculate the changes between consecutive
video frames using background subtraction. The second is the Event classification, where the
motion detected is classified as being an event or just noise.
The first algorithm is called background subtraction, and the particular variation used to per-
form this difference is the adaptive Gaussian mixture model for background subtraction described
in [23]. It needs several inputs besides the frames themselves:
Frame history Maximum length of the frame history to consider in the past of any point within
the video.
Threshold Threshold on the squared Mahalanobis distance to decide whether it is well described
by the background model [24].
Shadow detection Whether shadow detection should be enabled.
23
3. Proposed Video Analysis System
If the difference between frames is higher than the configured threshold (equation 3.1) for the
event classifier, and this difference persists for more than an also configurable minimum period of
time (defined to discard possible noise or irrelevant periodic changes), it is classified as an Event
(figure 3.2). After this same motion period ends, in order to keep context, if there is movement
again before a certain period of time runs out, the resulting Event is prolonged until the end of this
newly detected period with motion.
threshold =pixels with movement
total amount of pixels in image(3.1)
Movement detected
Event
Min
length
Min
lengthMax
gap
Max
gap
Max
gap
Time
Figure 3.2: Event classification
Therefore the Event classifier also needs some configuration parameters which are:
Threshold Percentage (in area ratio) of the thresholded result of the background subtraction that
needs to be set as changed for a specific frame to be classified as having movement.
Max gap Maximum amount of frames with no movement before considering the event as over.
Min length Minimum number of frames with movement in an event. If less are detected, the
event is discarded (used to disregard possible encoder noise).
3.2.2 Noisy Area Exclusion
Excluding noisy areas for automatic motion detection is a very useful method for eliminating
things such as hard coded timestamps or static regions with constant variations such as TVs or
windows. The simplest way to exclude a noisy area, once there is a motion detection method, is to
fit a static mask over the intended exclusion area on the video frames prior to inputting them in the
motion detection algorithm, as depicted in figure 3.3. This way, the mask will be of a static value,
immutable for the duration of the whole video, ignoring therefore any motion that may happen on
that area.
Figure 3.3: Noisy areas rejection data flow
24
3.3 Biometric Data Extraction
3.2.3 Camera Calibration
The camera calibration method chosen for this project makes use of a planar chess pattern,
like the one in 5.1a, and was implemented making use of algorithm 3.1, where optimize matrix()
is based on the method described in [12]. Instead of asking the user to provide a set of frames
with the calibration pattern, a whole video is used from which equally spaced frames are scanned
for the pattern.
Algorithm 3.1 Camera Calibration
Require: datasetEnsure: ∃ frames ∈ dataset
for all frames ∈ dataset doif ∃ pattern ∈ frame then
for all points ∈ pattern dofind subpixel(point)
end forpatterns← patternspattern
elsediscard(frame)
end ifend forif |patterns| > 0 thenoptimize matrix(patterns)
end if
3.3 Biometric Data Extraction
Several of the features described in 2.3 were explored in the process of this work, more pre-
cisely: Tattoos; Height; Face. The other features previously listed (gait, clothing, gender, ethnicity,
hair colour, periocular region, fingerprints, palmprints, hand veins and iris) were not researched
deeper because they were unusable in this project, they were not good candidates for integration,
and some because there simply was no time to integrate them for this first stage of the project.
3.3.1 Tattoos
Despite it not being integrated in the final project, some work was put into creating a segmen-
tation tool based on the watershed algorithm [25] that could isolate the inked part of a tattoo. One
example of a result obtained using this algorithm is shown in figure 3.4.
It was decided not to integrate the implemented tool or any other to detect, isolate or identify
tattoos in the present version of the platform, mainly due to the low usefulness to the LPC in
common cases. Despite that, the developed tool’s source code was made publicly available
in [27].
25
3. Proposed Video Analysis System
(a) Original image [26] (b) After segmentation
Figure 3.4: Tattoo segmentation results
3.3.2 Height
The simplest technique for height measurement is by comparing the height being measured
to a known reference. If this reference is in the same plane as the feature being measured, then
this plane can be re-projected to the image plane, making them one and the same thus allowing
for the measurements performed in the image to be extrapolated to the plane where the reference
and the feature are.
From original image (figure 3.5a) four points of a square or rectangle are given by the user,
from left to right and top to bottom. Each point gets its coordinates averaged with the adjacent
horizontal and vertical point, forming a rectangle with perfectly vertical and horizontal sides. The
image transformation needed to get the points from their position in the original image to the new
rectangular shape is calculated, and applied to the whole image. The result of this process is
shown in figure 3.5b. The resulting image has the advantage of being isometric on the plane of
the board, but with detached vertical and horizontal axis. In other words, it is possible to make
horizontal measurements in the same plane as the cork board and compare them to the horizontal
width of the board. The same principle applies for the vertical axis but not to any combination of
the two since the height and width of the reference rectangle is not taken into consideration in the
re-projection. Implementation details will be further explored in 4.6.
26
3.3 Biometric Data Extraction
(a) Original frame (b) Re-projection with reference
(c) Height measurement
Figure 3.5: Planar height measurement with reference
27
3. Proposed Video Analysis System
3.3.3 Face Detection
Since the face is widely present in the dataset provided by the LPC and it is a biometric trait
accepted in the Portuguese court, this was the first feature to be focused. Before this thesis, the
laboratories that cooperated with this project did not have way to extract a facial dataset from a
video. Instead, if there was the need to extract more than one face from a video, the solution
would be to hand select them from each frame. To overcome this problem, a feature detector
needed to be integrated in the project. The feature detector chosen was initially proposed in [28]
and later improved in [29].
The method used for processing a video with the objective of getting all the features that
match a certain classifier is described in figure 3.6. It was designed with face detection in mind,
but supports any kind of feature, as long as it is described by a cascade classifier.
Figure 3.6: Feature detection flowchart
28
4Implementation
Contents4.1 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Modularity of the Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Data Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4 User Interface Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5 GUI Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.6 Height Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.7 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.8 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.9 Other Useful Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.10 Computational Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.11 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.12 Open Sourcing the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.13 Build System and Continuous Integration . . . . . . . . . . . . . . . . . . . . . 40
29
4. Implementation
Following the system features described in the previous chapter 3, the implementation details
will be described in the following sections. Starting by listing the technologies used and the
implementation of the program itself, followed by the documentation and additional details on
some of the project’s decisions such as opensourcing it. The program that resulted from this MSc
thesis was dubbed Biometric Forensic Video analyzer (BioFoV).
4.1 Technologies Used
The implementation of this project had two main technical sides. The first being the User
Interface (UI) and the need to provide a fully working cross platform and modular framework. The
second being the video and image processing functionality. The ability to run the program under
Microsoft Windows R©is a requirement for this project, therefore, in order to ease development and
provide full cross platform support, the chosen libraries needed also to be cross platform. To
cover these needs, two libraries were used:
• Qt [30]:
A cross-platform application and UI framework for using C++. It is used to enable cross plat-
form operation of the UI and system dependent functionality, such as access to configured
printers and the file system;
• OpenCV [31]:
The most well known and actively developed open source computer vision and machine
learning software library. From the set of tools provided by OpenCV the ones used are
explained and the need for them justified in this section. The final version of the program
includes OpenCV 2.4.11, which was the latest stable version released to date. Version 3.0.0
beta was released prior to the conclusion of this thesis, and even though it promises very
high performance improvements (up to nine times faster processing and heterogeneous
Central Processing Unit (CPU) + Graphics Processing Unit (GPU) processing), it was not
used due to its instability and non uniformity for the time being. As soon as version 3.0.0 of
OpenCV is released it should be considered to replace the current version, but that will be
left for future work. Its C++ Application Programming Interface (API) is used to perform all
of the video and image operations;
To aid on the development itself, and ensure code quality and project continuity, other tech-
nologies were used:
• Doxygen [32]:
A documentation generator for C++, used for documenting the source code of the project
and mapping the class hierarchy;
30
4.2 Modularity of the Framework
• GIT [33]:
A widely used version control system, used to keep track of modifications on the source
code of the project, and enable an agile development method;
• GitHub [34]:
A GIT hosting service free for open source projects, to host the code, enable social inter-
action between developers, issue reporting and tracking, host releases of the project and a
wiki for the user manual and development instructions for whoever wants to contribute;
• Travis CI [35]:
A CI platform free for open source projects, to perform tests on the committed code to ensure
that there are no destructive changes to the program during development.
4.2 Modularity of the Framework
The program is implemented with modularity and expansion in mind so that future work can
be easily implemented on top of the existing features, enhancing the original feature set. A set of
functional modules, are the base for this project, illustrated in Figure 4.1. A Graphical User Inter-
face (GUI) is made available in order to be usable by forensic analysers of different backgrounds.
User Interface
Menus
Player
Controls
Dialogs
Data Classes
Frame
Event Mask
Video
Modules
(...)
Event Detector
Feature Detector
Measurements
Figure 4.1: Proposed modular architecture – high level design
4.3 Data Classes
As the name suggests, the data classes are where the data, gathered from the analysis of the
videos, is stored and processed. As illustrated in 4.2, there are five distinct data classes: Video,
Camera, Event, Snapshot and Feature.
Figure 4.2: Data classes relationship
31
4. Implementation
4.3.1 Video
The Video class is responsible for everything that is video related. The most immediate of
the operations is video reading, which includes opening a video file and decoding it to be further
processed. It can also apply the motion detection algorithm described in 3.2.1, creating a set of
Events associated with this same video.
4.3.2 Camera
This class comprises every aspect related to camera calibration and description. With it, it is
possible to calibrate a camera with a chessboard pattern and perform transformations such as
flipping a video or image (vertically and/or horizontally). It can also export its parameters to a file
in order to later import them to another video.
4.3.3 Event
The event class keeps a set of frames and snapshots together. These frames should be
contiguous since an event is supposed to be an interval in time, but for usability reasons they don’t
need to be. For example, exporting several events in one video file can be achieved by merging
the events in question and exporting the resulting Event. An Event is in its essence two series
of frames, being the first one of Frames (figure 4.3a), and the second one of the corresponding
Snapshots (figure 4.3b) that resulted from the background subtraction calculation. This allows for
the isolation of the object of interest (figure 4.3c).
4.3.4 Frame
The Frame class is a representation of a single frame of interest of a Video or Event. An
instance is related to either the Video or the Event object it belongs to. The frames themselves
are stored in individual files on disk, even though this allocates some extra hard disk space, it
allows for swifter video seeking since it eliminates the need for time consuming video decoding
processes.
4.3.5 Snapshot
This class works as a mask for a Frame object, in order to specify a certain region of interest.
For the moment, this class is used for the Face class to specify where in the frame the face is
and to store the background subtraction result (figure 4.3b), but it can be easily applied in other
situations.
32
4.3 Data Classes
(a) RGB (b) Mask
(c) Masked
Figure 4.3: Background subtraction example
33
4. Implementation
4.3.6 Feature
The Feature class describes something from an Individual (4.3.7). It encloses a description, a
pointer to an individual, and is used as a common interface for other classes. For the time being,
only one feature is implemented, but others can be easily built taking this one as template.
Face The Face class implements the Feature class, and it is a first example of how it can be
used. It bundles several Snapshots that delimit a person’s face in several different frames.
4.3.7 Individual
The purpose of this class is to bundle all the features from a certain individual. It was left
out of the current implementation since feature bundling (such as face clustering) has not been
implemented yet.
4.4 User Interface Classes
The following classes implement an easy to use interface to access and interact with the data
stored in the classes described in section 4.3.
4.4.1 Player
A Player is a base class that works as a common interface for the Video, Event and Frame
classes, and defines common methods for interacting with all of these classes in order to obtain
images, and skim through them without the need to worry about each class’ particular data struc-
tures. This interface is used by the VideoPlayer class to obtain images and show them to the
user, and by the image export tool to save isolated images to disk.
4.4.2 Drawable
The Drawable class is the base class for everything that is drawn on top of an image (Player).
It gives a flexible and consistent way to handle mouse clicks to insert points in the UI, and to draw
the output of the function in it in a homogeneous way across all the classes. Currently there are
several classes that extend Drawable:
Angle The angle measured in the image plane.
Length The length measured in the image plane.
Width The width measured in the image plane.
Height The height measured in the image plane.
34
4.5 GUI Implementation
Figure 4.4: User Interface
4.5 GUI Implementation
Even though the implementation of the GUI is not that much of technical interest in an aca-
demic environment, it is an essential component of this project that aims to be usable by people
with limited programming skills. Figure 4.4 shows all the UI components that belong to the main
window. In this figure two videos are loaded into the program, the event detector algorithm has
been applied to one of them, and one frame was extracted from the last of the resulting events.
A modifier has been applied to this last frame, in this case a reprojection of the plane of the pin
board to the image plane.
Toolbar (1) Can be configured with any function available in the program menus;
Video tab (2) Tree structure listing all the videos in the current project and their respective as-
sociated objects, such as detected events, extracted frames, and measurements on these
frames;
Feature tab (3) Tab for feature detection output, currently called Faces since it is the only feature
detected (shown collapsed in this figure);
Selection details (4) Information about selected item(s) (in figure 4.4 it is showing information
about the event E1);
Video player (5) Handles any class which implements the Player interface, enabling or disabling
the playback buttons and sliders, depending on the amount of frames available in the current
35
4. Implementation
playing item.
4.6 Height Calculation
As described in 3.3.2, the current implementation of the height calculation is performed mak-
ing use of planar references that can be used to re-project the image and be compared to the
perpetrator.
4.7 Motion Detection
The algorithm used for background subtraction previously described in 3.2.1 has several im-
plementation integrated in OpenCV, allowing it to use CPU, Compute Unified Device Architecture
(CUDA) or Open Computing Language (OpenCL) or even switch between them depending on
the hardware the program is running on. When the automatic event classification is selected, the
dialogue shown in figure 4.5 asks for five essential parameters the user has to input in order to
guide both algorithms, and a sixth which was left on the GUI to allow further tweaking in certain
situations. For the background subtraction method these variables are:
Frame history Length of the history.
default = 20s · FPSvideo
Threshold Threshold on the squared Mahalanobis distance to decide whether it is well described
by the background model [24].
default = 50
Shadow detection Whether shadow detection should be enabled.
default = false
Furthermore, so does the Event classifier:
(Area) threshold Percentage of the thresholded result of the background subtraction that needs
to be set as changed for a specific frame to be classified as having movement.
default = 1%
Max frames with no movement Maximum amount of frames with no movement before consid-
ering the event as over.
default = 5s · FPSvideo
Min frames/event Minimum number of frames with movement in an event. If less are detected,
the event is discarded (used to disregard possible encoder noise).
default = 2s · FPSvideo
36
4.8 Camera Calibration
Figure 4.5: Event classification dialogue
Even if this process of detection is somewhat slow (refer to 5.4.2), it is an unmanned process,
meaning it does not need any type of supervision and can be left running through the night, giving
the user more time to do something else.
Figure 4.6: Video separated in events
4.8 Camera Calibration
The video calibration tool in BioFoV implement OpenCV’s camera calibration function. As
reference points, for this same calibration, a chess pattern is used, with configurable width and
height in terms of the pattern’s inner corners, that are automatically detected in a calibration video
of the same camera, as shown in figure 4.7.
Horizontal and vertical image flip are also considered part of the calibration, since they reflect
on the intrinsic parameters of the camera. This was a key feature since it is not uncommon for
surveillance cameras to be mounted upside down, in which case there is the need to flip the video
37
4. Implementation
prior to analysis.
Figure 4.7: Camera calibration dialog
4.9 Other Useful Features
During the development process there was the need to implement some usability features that
complement the program’s usability and fulfil some of the day to day needs of the users. Among
these are the ability to export events to video files, and to print a single frame being shown in the
video player.
4.9.1 Export Events
Two codecs for encoding were tested, Huffman Lossless Codec (HFYU) and XVID MPEG-4
(XVID). Given the qualitative characteristics presented in table 4.1 the obvious choice was to opt
for the XVID encoder. Even though XVID it is not lossless, for the tested videos, the quality of the
output was more than enough to preserve the video details while producing an output file many
times smaller, thus better for evidence archive.
38
4.10 Computational Requirements
Codec Quality File SizeHFYU Perfect HugeXVID Really good Tiny
Table 4.1: Video codec comparison
4.9.2 Print Tool
A print tool was implemented, that uses the system defined printers (on whichever platform
the program is running), and prints the frame that is being displayed on the player. All printing
functions are handled by Qt’s class QPrinter, making it platform independent and hassle free.
4.10 Computational Requirements
In terms of processing power there can be two different approaches. The CPU one, where the
more cores available, and the more powerful the processing unit is, the better. And there is the
GPU approach, where with a modest CPU and a decent GPU with CUDA or OpenCL support, the
load of image and video processing can be offloaded to the GPU.
4.11 Documentation
In order to enable new users and developers to get properly onboarded on how to use and
develop BioFoV there is the need to create a user manual and to document the technical details
about the project’s structure.
4.11.1 Technical Documentation
The source code is documented making use of Doxygen [32]. It outputs a PDF file for printing
as well as an HTML folder that makes documentation browsing and function or class lookup easy.
The make rule for generating the documentation is named doc.
4.11.2 User Manual
The user manual is available as a wiki on the project’s GitHub page. It is a wiki since it is the
format that better allows for the users themselves to contribute to the documentation to improve
it, and also to browse it.
4.12 Open Sourcing the Project
This project had to be developed from scratch since there were no open source tools that
could be built on top of. Making the source and documentation available will enable anybody that
does not want to start from scratch to pick up where the project stands, and contribute to one
39
4. Implementation
uniform platform that will get better with each person’s contribution. For that purpose, the source
code was made publicly available [36]. This enables future development by third parties, allowing
other students and the scientific community to make their contribution and for all the investigators
to take advantage of this tool.
4.13 Build System and Continuous Integration
In order to make the program easy to build, deploy, test, release and install, a build system
used for bundling all the libraries used by BioFoV was created. It consists on a set of Makefiles
and a CI system.
4.13.1 Make Rules
A set of Makefiles has been made available to enable easy compilation of the project and its
dependencies. Both OpenCV and Qt are automatically configured and compiled so that a not so
technical user may start developing a module without the need to worry about linkage details of
these libraries.
4.13.2 Continuous Integration
The CI services from Travis CI [35] assure that the program keeps on compiling after every
commit, avoiding potential regressions. Regression tests were not implemented even though it
would be good to have them. But since for now this is a small project, it is easy to review every
commit and make sure it does not break anything. Once the project gets larger, every Merge
Request (MR) should be accompanied by its own set of tests showing what it fixes, or that it does
not break other features.
4.13.3 Cross Compilation Support
The way the project is built takes into consideration the preference of the user for Operating
System (OS), but it is beneficial to be able to compile it to another platform that is not the one
where it is being developed on. Therefore cross compilation is an important piece to make this
work available to anyone.
4.13.4 Static Build
Statically linking the program reduces the effort needed by a non technical user to install it,
and it proved to be a major plus for the program when presenting it to new users since all they
had to do was double click an icon to get it running. When there was still no static build the new
users refused to use the program since it was hard to update, and due to the rapid iteration and
continuous deployments, it was a task they would have to do often.
40
5Results
Contents5.1 Testing Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2 Datasets Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.5 Height Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.6 Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.7 GUI Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
41
5. Results
5.1 Testing Setup
The test machine used to run the benchmarks has an Intel Core i7-3770 clocked at 3.40GHz
with 16 GB of RAM (4x4096 MB @ 1600 MHz), running the linux distribution Kubuntu 14.10.
5.2 Datasets Used
Since none of the videos provided by the LPC can be disclosed, in order to perform the analy-
sis of the results, some test videos were recorded to be used in the public tests presented in this
document. These videos characteristics are described in this section.
5.2.1 Long Video
The main test video for event detection needs to be extremely long in order to show the anal-
ysis boost it can give to the users. For that reason, a video was recorded which has 2430553
frames at a frame rate of 10 Frames Per Second (FPS), resulting on a total time of approximately
3 days, 20 hours and 9 minutes. The video codec used to encode this test video was H.264 with
a resolution of 640 by 480 pixels, similar to what can be found in most surveillance systems. It
was recorded in a research computer room, where the camera was strapped to the ceiling (upside
down) and pointing at the door the whole time. Due to the lack of strength of the webcam’s arm,
the image slowly slides down during the footage.
5.2.2 Short Video for Height Measurement
A short video was shot in the same conditions and with the same encoding as the one pre-
viously described in 5.2.1. But instead of focusing on the sheer length of the video, this one
shows an individual exiting the room, coming back in, and standing next to a couple of objects.
This video is used for short event detection (where two events must be detected), and for height
measurement tests.
5.2.3 Videos for Calibration
For the camera calibration testing four videos provided by Dr. Peter Kastmand Larsen were
used. These videos were shot using a GoPro Hero camera in various settings: narrow, medium,
wide and super-wide Field Of View (FOV). All with a resolution of 1920 by 1080 pixels and 29
FPS. These videos show a chess pattern with 7x9 inner corners moving in all the sides of the
canvas.
42
5.3 Calibration
5.3 Calibration
The re-projection error of the calibration tests performed on the previously described videos
are shown in figure 5.1. These tests were performed using 100 frames equally spaced throughout
the video in order to try to optimize the distribution of the pattern location in the image, so if a
video had 2000 frames, the frame skip value was 20. Only one iteration of the algorithm was
performed since it did not show any benefits for doing extra ones, and the computational time
increased significantly.
5.3.1 Precision
The re-projection errors from the GoPro in the different modes are shown in table 5.1. As it is
visible from the distortion in figure 5.1h, and from the increasing re-projection error in table 5.1,
the wider the camera becomes, the poorer the fit of the camera model used becomes. To tackle
this issue, the usage of another camera model specially tailored for fisheye cameras should be
used. This camera model is conveniently implemented in OpenCV, but was not yet integrated in
BioFoV.
Camera mode Re-projection errorNarrow 2.015Medium 5.772Wide 9.656Super wide 9.102
Table 5.1: Calibration precision test results
Making a video that where the pattern would be moved through the whole canvas of the cam-
era proved to be a challenging task, since the users asked to move the pattern uniformly in all
corners of the camera could not do it properly without video feedback, which is not available in
most cases for surveillance cameras. The usage of a pre-defined image dataset chosen by the
user can be a way to solve this problem. The downside of this alternative process is that it would
take more time to perform this choice of frames. Another way to solve this issue can be by de-
tecting the pattern in every single frame, and instead of using all of the detected patterns, picking
a set widespread through the whole canvas. This second option is a better, though more uncon-
ventional approach, since it can provide the quality of the dataset and the ease of calibration at
the same time. The only downside would be the processing time required to detect the pattern in
every single frame, but this is not a relevant problem, since this only has to be done once for each
camera, and the calibration process is not a long one.
43
5. Results
(a) Narrow FOV before calibration (b) Narrow FOV after calibration
(c) Medium FOV before calibration (d) Medium FOV after calibration
(e) Wide FOV before calibration (f) Wide FOV after calibration
(g) Super wide FOV before calibration (h) Super wide FOV after calibration
Figure 5.1: Calibration results
44
5.4 Event Detection
5.4 Event Detection
The events detected using the video described in 5.2.1 are triggered by two types of changes,
people getting out and getting in of the room, and the corridor lights being turned on or off were
also triggered as an event since the door has a glass window on top that leads to the corridor.
Since the computer room has two walls that are in fact large windows facing both south and east,
the full length of the video included large differences in lighting caused by the sun which did not
pose any problems to the event detection algorithm since they were not sudden changes.
5.4.1 Precision
The resulting events lack context since it is only shown in the period that is stored in the
resulting event is only associated with the period where motion is detected. To improve the context
it would be a good idea to include in the event a certain amount of frames or period of time before
and after the motion was detected.
5.4.2 Performance
To scrape the same video for events, there was the need to allocate a total of 336 MB of
memory. This number does not scale with the size of the video, but instead with the amount of
events detected and their respective length. The performance was similar both using CPU and
GPU, given the test machine has a high end CPU and a low end GPU. Table 5.2 shows the real
run time on the test machine, along with the CPU time that was needed to complete the analysis.
The speedup is the time that would take a human to analyse the full length of the video, over the
time it took for the program to analyse it and split it into separate events (Equation 5.1).
Speedup =V ideo Length
Real time(5.1)
Video file Length (time) CPU time Real time SpeedupLong 3d20h09min 31h43min 6h5min 15x
Table 5.2: Event detector performance
Given the fairly equal time results obtained on both computational methods, we can assume
that using a couple of high end GPU side by side would result in the best performance for a given
budget. But due to compatibility issues introduced by the cross platform problem and the fact that
not every computer has a decent or compatible GPU, the CPU option was the one chosen.
45
5. Results
5.5 Height Measurement
The height measurement was tested using the video described in 5.2.2, and the result of
height measured and respective ground truth are presented in table 5.3.
Subject Board PersonHeight (meters) 0.91 1.72Measured Height (pixels) 184 341Measured Height (meters) – 1.68
Table 5.3: Height Measurement results
This shows an error of little more than 2%, even though small variations in where the user
considers to be the vertical projection of the top of the head on the floor, and where the top of the
head itself is, may result in higher measurement errors. These ambiguous areas are highlighted
with a red line in Figure 5.2. From these, we can estimate a worst case scenario for the height
deviation, which in this case reaches from 1.55m to 1.73m. Due to this, the step of selecting a
reference for re-projecting a plane and to input the top of the head and its projection on the floor
must be done with the best precision possible.
Figure 5.2: Measured height
5.6 Feature Detection
Four different frontal face classifiers were used, all of which ship with OpenCV. As it can be
observed by the comparison in table 5.4, alt tree has the best precision, but the one with the
best recall is the alt2 classifier. There is therefore a trade-off in this situation, where depending
46
5.6 Feature Detection
on the time available to go through all the false positives versus the thoroughness intended, the
user can choose one classifier or the other. Note that on a per frame analysis is difficult to evaluate
weather a frame contains a complete or frontal face, since it is hard to define at what angle a face
is considered to be frontal. Therefore, the recall values calculated in table 5.4 consider the sum
of the maximum number of faces detected for each individual the total amount of frontal faces
present in the video.
Classifier Sub. 1 Sub. 2 Faces Detected False Positives Precision Recalldefault 76 22 98 169 0.367 0.98alt 77 21 98 48 0.671 0.98alt2 78 21 99 64 0.607 0.99alt tree 71 19 90 3 0.968 0.90Maximum 78 22
Table 5.4: Face detection classifiers performance
Some of the faces detected in the test video used are shown in figure 5.3, some of which are
in the same frame. Furthermore, if we take a close look into the false positives, we can find some
resemblance to a human face.
Figure 5.3: Feature extraction results (using the alt tree classifier)
47
5. Results
5.7 GUI Validation
Since this project was developed with and for the crime scene investigators, it has been in both
the Danish and Portuguese team’s hands since the very first stages in has been in their hands,
helping in daily tasks.
48
6Conclusions
Contents6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
49
6. Conclusions
The original objective of this thesis was to create a program that would be able to extract
biometric features from surveillance videos. Given that the result of it consists on a program
with a graphical user interface that is able to streamline the video analysis process and integrate
several biometric feature detectors and extractors in one single tool, it is safe to conclude that, by
not giving biometry itself all the focus, but instead focusing on usability and integration of future
work, the project was a success and will continue to be.
As the result of the international cooperative effort to create this program and give it continuity,
a project team was created which will continue. To support it, and to further make public the
discussion of the algorithms to be integrated, the BioFoV’s GitHub [34] repository was created
which already hosts an unstable build of the program.
6.1 Future Work
This final section will present the vision for the future of this project along with some new useful
features which were not implemented yet and which interest for them arose.
6.1.1 Implementation of New Features
The project is constructed in a way that makes it fairly easy to implement new features on top
of the existing ones. To enhance the current feature set, these are some of the ideas that came
up during this first stage of the project, but didn’t have time to be implemented:
1. Face matching.
Based on the faces extracted, there should be a module that would analyse the faces set
of a certain video and strip it into groups, each of the subsets corresponding to a certain
person. This is currently being researched by another MSc’s student (Joao Satiro) who will
then integrate his algorithms in the program.
2. File format.
In order to enable saving the state of the program and resuming or reviewing the analysis
afterwards, a file format has to be defined. This was part of the project which was not
defined nor implemented and should be considered as a top priority for future developers.
3. Automatic silhouette extraction and matching.
As laid out in 2.3.1.H, silhouette extraction and matching is currently a manual process.
Optimizing it, by taking advantage of the Snapshots extracted by the event detector, and
automatically matching them, by minimizing the distance between the given silhouette and
the ones on the video, would not only drastically reduce the effort needed to perform this
task, but also improve the quality of the match since all the frames would be scrutinized and
50
6.1 Future Work
the minimal distance between silhouettes would be computed instead of hand made. This
distance can also be used for quantification of the quality of the result.
4. Camera calibration using the plumb-line constraint and minimal Hough entropy [13].
This method for calibrating a camera may prove to be very useful if the camera has been
destroyed, since it uses the existing lines in the image to perform the calibration.
5. 3D point cloud visualization.
With the latest advances in OpenCV, more precisely the Viz module implemented in OpenCV
2.4.9 it should be fairly easy to extend the current video player class of the UI and imple-
ment 3D visualization capability. This will only prove to be useful if 3D data is made avail-
able, either from the scene reconstruction module, or from inserting points and planes in the
environment manually, thus reconstructing the scene.
6. 3D scene reconstruction.
The implementation of the 3D point cloud visualization will enable the integration of a scanned
3D point cloud that can be obtained with a depth camera and a loop closing Simultaneous
Localization And Mapping (SLAM) algorithm after the crime has taken place. This can be
used for temporary crime scenes which are volatile and will disappear (short events such as
music festivals), creating a 3D model of the scene, which can be used for the investigation
after the crime scene is destroyed.
7. Report generation.
From the video report tools analysed, both Kinesense’s LE and Forevid have a reporting and
note taking feature useful to case documentation, where the user can create notes on the
various events detected or in specific frames, which can then be processed into a Portable
Document Format (PDF) file which can be easily be presented in a court hearing.
6.1.2 Improvements on Third Party Libraries
Since this project is highly dependent on third party libraries, it takes advantage of any im-
provements, either in performance or quality in results, that may be implemented in these. Cur-
rently OpenCV 3.0 beta includes a subset of Intel R© Integrated Performance Primitives (IPP) and
promises to bring considerable speed-ups to the library [37]. It also includes the license to re-
distribute applications that use IPP-accelerated OpenCV, allowing therefore the distribution of a
pre-build version of BioFoV with the benefits of IPP and keeping the installation process simple.
6.1.3 Support More Platforms
Currently only Microsoft Windows and GNU/Linux are supported by BioFoV. There are benefits
in also supporting at least Apple’s OSX. This was not possible during this first stage mostly due
51
6. Conclusions
to inexperience with this OS and the lack of a build and testing platform.
The creation of a computer specifically designed and optimized to run this program may be
well received by the police departments, once the platform has enough features, since it could
provide them with easy support and optimal performance. This may also be a way to finance the
project if there is the need to do so.
6.1.4 Integration Tests
In order to provide full set of tests that thoroughly evaluate code quality, the implementation
of integration, unit, acceptance and regression tests is a priority, with special emphasis on the
regression ones.
52
References
[1] E. C. in Science and Technology, “Integrating biometrics and forensics for the digital age,”
accessed: 2014-08-24. [Online]. Available: www.cost.eu/domains actions/ict/Actions/IC1106
[2] N. Lynnerup and P. Larsen, “Gait as evidence,” Biometrics, IET, vol. 3, no. 2, pp. 47–54, June
2014.
[3] A. Dantcheva, C. Velardo, A. D’Angelo, and J.-L. Dugelay, “Bag of soft biometrics for
person identification - New trends and challenges.” Multimedia Tools Appl., vol. 51, no. 2,
pp. 739–777, 2011. [Online]. Available: http://dblp.uni-trier.de/db/journals/mta/mta51.html#
DantchevaVDD11
[4] D. Cao, C. Chen, D. Adjeroh, and A. Ross, “Predicting Gender and Weight from Human
Metrology using a Copula Model,” 2012.
[5] A. K. Jain, R. Jin, and J.-E. Lee, “Tattoo Image Matching and Retrieval.” IEEE Computer,
vol. 45, no. 5, pp. 93–96, 2012. [Online]. Available: http://dblp.uni-trier.de/db/journals/
computer/computer45.html#JainJL12
[6] R. Bansal, P. Sehgal, and P. Bedi, “Minutiae Extraction from Fingerprint Images - a Review,”
CoRR, vol. abs/1201.1422, 2012. [Online]. Available: http://dblp.uni-trier.de/db/journals/corr/
corr1201.html#abs-1201-1422
[7] S. Minaee and A. Abdolrashidi, “On The Power of Joint Wavelet-DCT Features for
Multispectral Palmprint Recognition,” Sep. 2014. [Online]. Available: http://arxiv.org/abs/
1409.7818v1;http://arxiv.org/pdf/1409.7818v1
[8] S. M. Lajevardi, A. Arakala, S. Davis, and K. J. Horadam, “Hand vein authentication using
biometric graph matching,” IET Biometrics, vol. 3, no. 4, pp. 302–313, 2014.
[9] J. R. G. Neves and P. L. Correia, “Hand Veins Recognition System,” in VISAPP’14, 2014, pp.
122–129.
[10] S. S. Farfade, M. Saberian, and L.-J. Li, “Multi-view Face Detection Using Deep Convolutional
Neural Networks,” Feb. 2015. [Online]. Available: http://arxiv.org/abs/1502.02766v1;http:
//arxiv.org/pdf/1502.02766v1
53
References
[11] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Clos-
ing the Gap to Human-Level Performance in Face Verification,” in
Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[12] Z. Zhang, “A flexible new technique for camera calibration.” IEEE Trans. Pattern
Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, 2000. [Online]. Available:
http://doi.ieeecomputersociety.org/10.1109/34.888718
[13] E. Rosten and R. Loveland, “Camera distortion self-calibration using the plumb-line
constraint and minimal hough entropy,” CoRR, vol. abs/0810.4426, 2008. [Online]. Available:
http://dblp.uni-trier.de/db/journals/corr/corr0810.html#abs-0810-4426
[14] C. S. McCamy, H. Marcus, and J. G. Davidson, “A color-rendition chart,” J. Appl. Photogr.
Eng., vol. 2, no. 3, pp. 95–99, Summer 1976.
[15] F. Project, “Forevid Forensic video analysis for everyone,” accessed: 2014-06-23. [Online].
Available: www.forevid.org
[16] A. Buades, B. Coll, and J.-M. Morel, “Non-Local Means Denoising,”
Image Processing On Line, vol. 1, 2011. [Online]. Available: http://dx.doi.org/10.5201/
ipol.2011.bcm nlm
[17] Wikipedia, “Deinterlaced vs interlaced image — Wikipedia, The Free Encyclopedia,” 2007,
[Online; accessed 26-March-2015]. [Online]. Available: https://commons.wikimedia.org/wiki/
File:Deinterlaced vs interlaced image.gif
[18] R. C. Gonzalez and R. E. Woods, Digital Image Processing (3rd Edition), 3rd ed. Prentice
Hall, Aug. 2007.
[19] D. Mitzel, T. Pock, T. Schoenemann, and D. Cremers, “Video Super Resolution Using Duality
Based TV-L1 Optical Flow.” in DAGM-Symposium, ser. Lecture Notes in Computer Science,
J. Denzler, G. Notni, and H. Suße, Eds., vol. 5748. Springer, 2009, pp. 432–441. [Online].
Available: http://dx.doi.org/10.1007/978-3-642-03798-6 44
[20] H. Intergraph Corporation, http://www.intergraph.com, December 2013.
[21] Kinesense, http://www.kinesense-vca.com, December 2013.
[22] PhotoModeler, “Close-range photogrammetry and image-based modelling,” accessed:
2014-06-25. [Online]. Available: www.photomodeler.com
[23] Z. Zivkovic, “Improved adaptive gaussian mixture model for background subtraction,” in
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference
on, vol. 2, Aug 2004, pp. 28–31 Vol.2.
54
References
[24] N. Friedman and S. Russell, “Image segmentation in video sequences: A probabilistic ap-
proach,” in Proceedings of the Thirteenth Conference Annual Conference on Uncertainty in
Artificial Intelligence (UAI-97). San Francisco, CA: Morgan Kaufmann, 1997, pp. 175–181.
[25] F. Meyer, “Color image segmentation,” in Image Processing and its Applications, 1992.,
International Conference on, 1992, pp. 303–306.
[26] GangInk, “GangInk,” accessed: 2015-03-26. [Online]. Available: http://gangink.com/index.
php?pr=KRAZY GETDOWN BOYS
[27] M. M. de Almeida, “Tattoo Segmentation Tool,” accessed: 2015-04-08. [Online]. Available:
https://github.com/BioFoV/tattoo-segmentation
[28] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,”
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE
Computer Society Conference on, vol. 1, pp. I–511–I–518 vol.1, 2001.
[29] J. A. Belward, “An exponential version of filon’s rule,” J. Comput. Appl. Math., vol. 14,
no. 3, pp. 461–466, Mar. 1986. [Online]. Available: http://dx.doi.org/10.1016/0377-0427(86)
90081-6
[30] Qt Project, “Qt cross-platform application and UI framework,” accessed: 2015-03-28.
[Online]. Available: http://www.qt.io/
[31] itseez, “OpenCV open source computer vision and machine learning software library,”
accessed: 2014-06-24. [Online]. Available: opencv.org
[32] D. van Heesch, “Doxygen Generate documentation from source code,” accessed:
2014-06-24. [Online]. Available: www.stack.nl/∼dimitri/doxygen
[33] L. Torvalds, “GIT free and open source distributed version control system,” accessed:
2014-06-24. [Online]. Available: git-scm.com
[34] “Build software better, together.” accessed: 2014-08-29. [Online]. Available: github.com
[35] T. CI, “Travis CI,” accessed: 2015-03-02. [Online]. Available: https://travis-ci.org
[36] M. M. de Almeida, “Biofov,” accessed: 2015-02-10. [Online]. Available: https:
//github.com/BioFoV/BioFoV
[37] itseez, “OpenCV 3.0 beta,” accessed: 2015-03-01.
55