UserVeriﬁcationUsingBehaviouralCancelable Templates€¦ · uma base de dados com traços...

Marcelo Damasceno de Melo

User Verification Using Behavioural CancelableTemplates

Natal-Brasil

07/03/2016

Marcelo Damasceno de Melo

User Verification Using Behavioural Cancelable Templates

Thesis submitted to PostGraduate Programmein Systems and Computation of Informaticsand Applied Mathematics Department of Fe-deral University of Rio Grande do Norte asrequirement to obtain the title of PhD in Com-puter Science.

Federal University of Rio Grande do Norte - UFRN

Exact and Earth Sciences Center

Informatics and Applied Mathematics Departament

PostGraduate Programme in System and Computation

Orientador: Profa. Dra. Anne Magály de Paula Canuto

Natal-Brasil07/03/2016

Melo, Marcelo Damasceno de. User verification using behavioural cancelable templates /Marcelo Damasceno de Melo. - Natal, 2016. 158f: il.

Orientadora: Profa. Dra. Anne Magály de Paula Canuto.

Thesis (PhD) - Federal University of Rio Grande do Norte.Exact and Earth Sciences Center. Informatics and AppliedMathematics Departament. PostGraduate Programme in System andComputation.

1. Biometria comportamental. 2. Biometrias canceláveis. 3.Proteção de template. 4. Funções não inversíveis. 5. Funçõescanceláveis. 6. Verificação de usuários. I. Canuto, AnneMagály de Paula. II. Título.

Catalogação da Publicação na FonteUniversidade Federal do Rio Grande do Norte - UFRN

Sistema de Bibliotecas - SISBI

Marcelo Damasceno de MeloUser Verification Using Behavioural Cancelable Templates

Thesis submitted to PostGraduate Programmein Systems and Computation of Informaticsand Applied Mathematics Department of Fe-deral University of Rio Grande do Norte asrequirement to obtain the title of PhD in Com-puter Science.

Approved Thesis. Natal-Brasil, 07/03/2016:

Profa. Dra. Anne Magály de PaulaCanuto

Orientadora

Prof. Dr. Benjamín René CallejasBedregal

Convidado Interno

Prof. Dr. João Carlos Xavier JúniorConvidado Externo ao Programa

Prof. Dr. George Darmiton da CunhaCavalcanti

Convidado Externo

Prof. Dr. André Ponce de Leon F. deCarvalho

Convidado Externo

Natal-Brasil07/03/2016

This thesis is dedicated to my family, who supported me all my life through each given step.

Agradecimentos

Inicialmente gostaria de agradecer à meus pais, Marcos Antônio e Luizete Damasceno.Considero esta fase como uma das últimas na formação acadêmica. Desde o começo destaformação (graduação), meus pais foram meu grande apoio, tendo mudado suas vidas para meacompanhar em Maceió neste mundo que é a Ciência da Computação. Assim, agradeço detodo meu coração e alma aos meus pais. Sem eles, eu não estaria completando esta importantefase da minha vida. À meu irmão, Marcos Augusto, “Marquinhos”, que seu modo de vidatranquilo, levando tudo com bons olhos me faz repensar a vida desta forma também. Às vezes,sua imagem é a inversa da minha em um espelho, mas a alegria de viver, de ver cada umcrescer está compartilhado em nossas veias. À minha esposa, Alia Titara. Desde os meus15/16 anos está ao meu lado. Por me apoiar e suportar durante a minha estadia de uma anona Inglaterra sozinha, com nossa filha Arya. Agradeço imensamente o seu apoio e entenderque morar fora do país sempre foi um sonho de vida. Lembro todos os dias que quis retornarpara ficar com nossa filha, você dizia: “Lembre o que você foi fazer aí. Logo, logo você estaráde volta.” Claro a minha filha Arya Myrcella, my little star, que todos os dias estava comigo,mesmo a distância em meu coração e papel de parede. Como minha mãe diz, só sabe o que épai quando se é pai. A minha sogra, Aila Maria. Que durante este tempo apoiou com seu jeitoo meu sucesso. Sei perceber as pequenas ações. Obrigado. E a meu futuro filho/a, que nestemomento parece fechar um ciclo. O ciclo de estudante a um ciclo de profissional acadêmico.

À minha orientadora, Anne Magaly, com seu jeito doce e sua compreensão que cadaum tem seu tempo. Obrigado por esta parceria e por me apresentar e proporcionar grandesoportunidades. Sem seu apoio, eu não teria completado esta tão querida fase de minha vida,meu título de doutor. À Norman Poh, meu orientador durante meu doutorando sanduíche naUniversity of Surrey. Norman me mostrou como a ciência feita de forma séria pode trazerfrutos importantes para a sociedade. Que o trabalho árduo e constante, proporciona resultados.Obrigado pelo suporte durante um ano longe de minha família.

Aos amigos que fiz durante esta jornada, Valério, Issac e Cleverton. Obrigado pelasconversas e momentos de descontração. Aos amigos que compartilharam grandes momentosem Surrey: Veronika, Poh Joh, Huda, Franciskos, Josy, Andreas (um brasileiro), Stathis,Costas, Anna, Paul, Fran, Cherry e Maria Luiza.

Ao IFRN, que permitiu que este trabalho fosse realizado com foco desde o início.O afastamento foi de vital importância para a qualidade do trabalho realizado. Aos quediretamento ou indiretamente contribuiram durante esta jornada de 4 anos.

Don’t Stop ’Til You Get Enough (Michael Jackson)

ResumoVerificar a identidade de usuários é um problema enfrentado diariamente por pessoas e sistemasque autorizam o acesso de pessoas levando em conta características físicas e documentosapresentados. Entretanto, esforços ainda precisam ser aplicados na resolução de questõesrelacionadas a precisão do sistema verificador e na falsificação de objetos e informaçõespessoais.Assim, o uso de biometria pode minimizar ou resolver estas dificuldades. Infelizmente,a substituição de traços biométricos em caso de comprometimento na segurança de sistemasbiométricos é dificultoso. Deste modo, o uso de funções não inversíveis é proposto com oobjetivo de distorcer os traços biométricos, adicionando assim a característica cancelável aosdados biométricos.

Esta tese analisa algumas soluções para verificação de usuários através do uso de biometriascomportamentais canceláveis. Especificamente, as soluções são baseadas no TouchAlytics,uma base de dados com traços realizados por usuários em uma tela touchscreen presenteem smartphones. As soluções analisadas contituem no uso de classificadores e comitê declassificadores para resolução do problema de verificação de usuários. Além das soluçõesdescritas, esta tese analisa o uso de múltiplos template biométrico canceláveis utilizandoum comitê de classificadores (Esquemas de Proteção de Privacidade Múltipla). Esta tesetambém apresenta uma discussão facetada do impacto da chave em uma função cancelável.Para isso, apresentamos um estudo empírico na qual utiliza chave única ou chaves diferentespara proteção dos templates biométricos dos usuários. Além disso, um estudo do impacto dotamanho da chave na precisão de um sistema de verificação biométrico é discutido. A partirdos resultados encontrados, a geracão e armazenamento da chave cancelável em telefonesmóveis é analisado.

Nós demostramos que o uso de funções de tranformação geralmente fornece performancesimilar ou superior a performance dos templates sem proteção (dados Original), exceto usandoa funcão BioHashing. O uso de múltiplos templates protegidos processados por comitês demaquina superaram os resultados do uso de classificadores e comitês de máquina. Além domais, nós mostramos que os sistemas biométricos que utilizam templates canceláveis preservama privacidade do usuário, ou seja, fornecem baixa Taxa de Falsa Aceitação e Falsa Rejeição emataques de chave desconhecida e performance similar a templates desprotegidos em ataquescom chave conhecida. Baseado nos experimentos de comprimento de chave, nós observamosuma melhora contínua quando o comprimento da chave aumenta nas funções Interpolaçãoe BioHashing em ambos os ataques de chave. A função Soma Dupla teve melhora inferiormas o importante é que a performance não diminui com o aumento do comprimento da chave.Concluindo, baseado em nossos resultados, nós propomos o uso de uma chave de usuário única

gerada por sistema para autenticação de usuário em dispositivos móveis usando múltiplostemplates biométricos canceláveis

Palavras-chaves: Biometria Comportamental, Biometrias Canceláveis, Proteção de Template,Funções Não Inversíveis, Funções Canceláveis,Verificação de Usuários, Autenticação emSmartphones, Comprimento de Chave, Armazenamento de Chave.

Abstract

The verification of user identity is a daily problem faced by people or systems that authorizepeople taking account physical and document characteristics. However, efforts need to bespent to resolve related questions to accuracy of verification system and falsification of objectsand personal information. Thus, Biometrics proposes to decrease or to resolve these difficulties.Unfortunately, the substitution of biometric traces in the case of compromise is difficult. Hence,the use of non-invertible functions can be used to protect biometric traces. Consequently,these functions allow that biometric templates becomes cancelable/revocable.

This thesis analysis some solutions to user verification using cancelable behavioural biometrics.Specifically, our solutions are based on TouchAlytics, a dataset of touchscreen strokes performedin smartphones. The analysed solutions are based on the use of single classifiers and ensemblesystem to resolve the user verification problem. In addition, this thesis analysis the use ofmultiple cancelable templates using ensemble system (Multiple-Privacy Protection Schemes).This thesis also presents a discussion of key impact in cancelable functions. Thus, we performan empiric study using single or different user key values in cancelable functions. Moreover,a study of the impact of key length is discussed based on the accuracy of biometric system.Based on the results, it is discussed the generation and storage of cancelable keys in mobiledevices.

In summary, we have demonstrated that the use of transformation functions usually pro-vides similar or better performance than unprotected biometric data, except in BioHashingfunction. In addition, the use of multiple protected templates processed by ensemble systemoutperformed the previous results in single classifiers and ensemble systems. Moreover, weshowed that biometric systems with cancelable templates preserves the user privacy, i.e, itprovides lower False Acceptance and False Rejection Errors in Unknown Key attacks andsimilar performance to unprotected biometric samples in Known Key Attacks. Based on keylength experiments, we observe a perceptible continuous improvement when the key increasesin Interpolation and BioHashing method in both key knowledge attacks. In contrast, DoubleSum has minor improvements but the importance is that the performance does not decreasewhen user key increases. In conclusion, based on our findings, we propose the use of a singleuser key generated by system to authenticate users in mobile devices using multiple cancelablebiometric templates.

Keywords: Behavioural Biometrics, Cancelable Biometrics, Template Protection, Non-inver-tible Functions, User Verification, Mobile Authentication, Key Length, Key Storage.

Lista de ilustrações

Figura 1 – Identification and Verification Task using Biometric System Modules.Source: (37) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Figura 2 – Illustration of BioConvolving procedure using W = 3. Source: (47) . . . . . 49Figura 3 – An example of a decision tree. Source: (2) . . . . . . . . . . . . . . . . . . 63Figura 4 – Example of a Multi-Layer Perceptron . . . . . . . . . . . . . . . . . . . . . 66Figura 5 – An illustration of the general framework of an ensemble system . . . . . . 68Figura 6 – Organization of the Methodology . . . . . . . . . . . . . . . . . . . . . . . 74Figura 7 – Generation of cancelable biometric samples for each user . . . . . . . . . . 75Figura 8 – Inter Session BoxPlots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Figura 9 – Inter Week BoxPlots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Figura 10 – Intra Session BoxPlots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Figura 11 – Better Ensemble Boxplots for Original and Cancelable Functions . . . . . 88Figura 12 – Flowchart represents the system architecture . . . . . . . . . . . . . . . . . 91Figura 13 – BoxPlot of MPPS scenarios using horizontal strokes . . . . . . . . . . . . . 93Figura 14 – BoxPlot of MPPS scenarios using scrolling strokes . . . . . . . . . . . . . . 94Figura 15 – BoxPlot of relative change of EER(%) using horizontal scenarios . . . . . . 95Figura 16 – BoxPlot of relative change of EER(%) using scrolling scenarios . . . . . . . 96Figura 17 – Cross Match Rate Attack scenarios Illustration. a) The attacker inserts

a protected template f(xj, k2j ) in order to access application 1. b) The

attacker inserts a protected template f(xj, k1j ) in order to access application

2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106Figura 18 – DET curve of all scenarios protected with Interpolation transformation

function. (a) Scrolling Strokes; (b) Horizontal Strokes. . . . . . . . . . . . 116Figura 19 – DET curve of all scenarios protected with BioHashing transformation

function. (a) Scrolling Strokes; (b) Horizontal Strokes. . . . . . . . . . . . 117Figura 20 – DET curve of all scenarios protected with BioConvolving transformation

function. (a) Scrolling Strokes; (b) Horizontal Strokes. . . . . . . . . . . . 118Figura 21 – DET curve of all scenarios protected with DoubleSum transformation

function. (a) Scrolling Strokes; (b) Horizontal Strokes. . . . . . . . . . . . 120Figura 22 – DET curves for Interpolation - Known Key Attack. (a) Heterogeneous -

Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . . 133

Figura 23 – DET curves for Interpolation - Unknown Key Attack. (a) Heterogeneous -Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . . 134

Figura 24 – DET curves for BioHashing - Known Key Attack. (a) Heterogeneous -Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . . 135

Figura 25 – DET curves for BioHashing - Unknown Key Attack. (a) Heterogeneous -Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . . 136

Figura 26 – DET curves for BioConvolving - Known Key Attack. (a) Heterogeneous -Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . . 137

Figura 27 – DET curves for BioConvolving - Unknown Key Attack. (a) Heterogeneous- Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous- Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . 138

Figura 28 – DET curves for DoubleSum - Known Key Attack. (a) Heterogeneous -Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . . 139

Figura 29 – DET curves for DoubleSum - Unknown Key Attack. (a) Heterogeneous -Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes. . . . . . . . . . 140

Lista de tabelas

Tabela 1 – Average Equal Error Rate - Scrolling Strokes . . . . . . . . . . . . . . . . 77Tabela 2 – Average Equal Error Rate - Horizontal Strokes . . . . . . . . . . . . . . . . 78Tabela 3 – Comparative Analysis of Ensemble structures and Single Classifiers for

Scrolling dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Tabela 4 – Comparative Analysis of Ensemble structures and Single Classifiers for

Horizontal dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Tabela 5 – Mean Results using Scrolling Traits . . . . . . . . . . . . . . . . . . . . . . 82Tabela 6 – Mean Results using Horizontal Traits . . . . . . . . . . . . . . . . . . . . . 82Tabela 7 – Voting - EER - Percentage. Source: (16) . . . . . . . . . . . . . . . . . . . 93Tabela 8 – Mathematical Formulation of Key Knowledge Attack in Homogeneous and

Heterogeneous scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Tabela 9 – Cross Match Effort Points - shows the effort in points which an attacker

needs to spend to invade a biometric system protected by MPPS . . . . . 112Tabela 10 – EER by scenario and Template Protection Method using Scrolling Strokes

classified by SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Tabela 11 – EER by scenario and Template Protection Method using Horizontal Strokes

classified by SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Tabela 12 – Heterogeneous vs. Homogeneous Statistical Test using Known Key Attack

Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121Tabela 13 – Heterogeneous vs. Homogeneous Statistical Test using Unknown Key Attack

Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Lista de abreviaturas e siglas

ANN Artificial Neural Network

PIN Personal Identification Number

FMR False Matching Rate

FNMR False non matching Rate

DCS Dynamic Classifier Selection

LCA Local Class Accuracy

EER Equal Error Rate

FAR False Acceptance Rate

FRR False Rejection Rate

MPPS Multi Privacy Protection Schemes

MLP Multi Layer Perceptron

SVM Support Vector Machine

kNN K-Nearest Neighbours

NB Naive Bayes

DET Detection Error Trade-off

ROC Receiver Operating Characteristics Curve

MBS Multibiometric System

Sumário

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.1 Research Problem and Motivations . . . . . . . . . . . . . . . . . . . . . 271.2 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.4 Published Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 BIOMETRIC BASED RECOGNITION . . . . . . . . . . . . . . . . . . 332.1 Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.1.1 Biometric Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.2 Biometric System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.1 Verification and Identification Task . . . . . . . . . . . . . . . . . . . . . . 352.2.2 Biometric System Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.3 Advantages and Limitation of Biometrics . . . . . . . . . . . . . . . . . . 372.4 Multibiometric System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4.1 Multibiometric Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.5 Chapter Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 TEMPLATE PROTECTION . . . . . . . . . . . . . . . . . . . . . . . 413.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2 Biometric Cryptosystems . . . . . . . . . . . . . . . . . . . . . . . . . . 433.3 Feature Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.4 Cancelable Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5 Cancelable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.5.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.5.2 BioHashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.5.3 BioConvolving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.5.4 Double Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.6 Function of Key in Cancelable Functions . . . . . . . . . . . . . . . . . 503.6.1 Key Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.6.2 Key Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.6.3 Key Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.7 Multi-Biometric Template Protection . . . . . . . . . . . . . . . . . . . 55

3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 STATISTICAL LEARNING AND ENSEMBLE METHODS . . . . . . 594.1 Statistical Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.1.1 Evaluation of Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.2 Classification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2.1 k-NN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.2.2 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.2.3 Naive-Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2.4 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2.5 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.3 Ensemble Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.3.1 Learning Strategies in Ensemble Systems . . . . . . . . . . . . . . . . . . . 694.3.2 Ensemble Systems for Cancelable Biometrics . . . . . . . . . . . . . . . . . 704.4 Chapter Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 USER AUTHENTICATION USING A CANCELABLE BEHAVIOURALMODALITY: SINGLE CLASSIFIERS AND ENSEMBLE SYSTEMS . 71

5.1 Touchalytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.2.1 Data Preprocessing Module . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.2.2 Decision Module for Single Classifiers and Ensemble Systems . . . . . . . . . 755.2.2.1 Single Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.2.2.2 Ensemble System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.3 Single Classifier Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 775.3.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.4 Ensemble System Experiments . . . . . . . . . . . . . . . . . . . . . . . 795.4.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.5 Chapter Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6 MULTI-PRIVACY PROTECTION SCHEMES . . . . . . . . . . . . . 896.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.4 Chapter Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 IMPACT OF KEY STORAGE IN CANCELABLE FUNCTIONS . . . . 997.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.1.1 Contributions and Contrast with the Literature . . . . . . . . . . . . . . . . . 1017.1.2 Our findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.2 Notation and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 1027.2.1 Original Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027.2.2 Transformed Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047.3 Homogeneous and Heterogeneous Key . . . . . . . . . . . . . . . . . . . 1077.4 Experimental Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087.4.1 Scenario 1 - Heterogeneous Key - Unknown Key . . . . . . . . . . . . . . . . 1087.4.2 Scenario 2 - Heterogeneous Key - Known Key . . . . . . . . . . . . . . . . . 1087.4.3 Scenario 3 - Homogeneous Key - Unknown Key . . . . . . . . . . . . . . . . 1097.4.4 Scenario 4 - Homogeneous Key - Known Key . . . . . . . . . . . . . . . . . 1097.5 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097.6 Analysis of Cross Match Attack Effort . . . . . . . . . . . . . . . . . . . 1117.7 Baseline Performance (unprotected samples) vs. Attack Scenarios . . 1127.7.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.7.2 BioHashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.7.3 BioConvolving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157.7.4 Double Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1197.8 Comparison of the Security/Performance of the Biometric System

using Homogeneous and Heterogeneous Key Scenario . . . . . . . . . 1197.8.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.8.2 BioHashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227.8.3 BioConvolving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237.8.4 DoubleSum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237.9 MPPS and the use of a single key . . . . . . . . . . . . . . . . . . . . . . 1247.10 Chapter Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

8 IMPACT OF KEY LENGTH IN CANCELABLE FUNCTIONS . . . . 1278.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1278.1.1 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1288.2 Methods to Increase the Key Length . . . . . . . . . . . . . . . . . . . 1298.2.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1298.2.2 BioHashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1308.2.3 BioConvolving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1308.2.4 Double Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1318.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1318.3.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8.3.2 BioHashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1328.3.3 BioConvolving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1348.3.4 DoubleSum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1358.4 Chapter Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

9 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1419.1 Use of Single and Ensemble System . . . . . . . . . . . . . . . . . . . . . 1419.2 Multi-Privacy Protection Schemes . . . . . . . . . . . . . . . . . . . . . 1429.3 Attacks Based on Key Knowledge . . . . . . . . . . . . . . . . . . . . . 1429.4 Key Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1439.5 Implication of Our Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 1449.5.1 Recommended Biometric System Configuration . . . . . . . . . . . . . . . . 1459.5.2 Key Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1459.6 Thesis Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1469.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1479.8 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Referências . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

25

1 Introduction

Currently most computer systems use a username-password method for user authenti-cation (35). The username-password combination contains several flaws regarding security,logistics and reliability on process of storage and use of authentication methods.

For instance, the use of username-password method brings some problems such asthe use of a single combination of username-password in different systems and an increase ofuser stress to remember long and complex passwords. A set of services is compromised if aunique combination of username-password is compromised as well. Moreover, different complexpasswords increase user stress and tend users to write down these passwords in physical media,such as post-it notes. Thus, authentication methods need to be secure, accurate, providevariability in user credentials without long and complex password inconvenience.

Biometrics has been studied in order to minimize such problems caused by theusername-password authentication methods (25). Biometrics can be considered as the scienceof establishing person’s identity using his anatomical or behavioural traits. Biometric traitshave a number of desirable properties with respect such as reliability, convenience, universality,and so forth. Due to these characteristics, Biometrics has been increasingly studied over thelast years and it is currently present in several authentication systems such as machines whichcontrol employees entrance or critical systems access, among others (72).

Biometric modalities can be divided in two categories: physical and behavioural.Physical modalities are usually related to the body features, such as face, fingerprint, irisrecognition, hand geometry, among others. Unlike physical biometrics, behavioural biometricsare related to user behaviour/actions (76). Behavioural biometrics use conduct patterns,such as gait or human-computer interactions (93). Behavioural modalities can be considerednon-intrusive, i.e, users are not aware of information has being collected.

Unfortunately, biometric traits are user-sensitive, i.e., in the case of leak, the usercannot change his traits in order to change the enrolled biometric sample. Thus, biometricsystems need to protect the enrolled templates against well known attacks such as replayattacks, template linkage and spoofing (53). Therefore, biometric systems need to implementsecure methods to avoid these attacks and revoke a compromised template. Biometric templateprotection methods protect individual characteristics. Consequently, template protection is asecurity feature that needs to be addressed in biometric-based authentication systems (38).In addition, protection methods need to provide a method to revoke a compromised template.Therefore, template protection methods conserve user information and revoke a compromised

26 Capítulo 1. Introduction

biometric template.

Due to biometric characteristics, biometric traits are permanently associated with a user.Template protection methods have being increasingly adopted to address security of biometrictemplates (38). Cancelable functions transform or intentionally-distort biometric data in orderto protect the sensitive user information (64). Furthermore, the distorted biometric featuresare used to replace original features. Thus, in the case of system compromise, the templatesare removed and a new enrolment process is executed in order to create new cancelabletemplates for the same enrolled users.

Cancelable physical modalities have been widely reported in the literature. In (44, 55),the authors show that cancelable methods provide similar or worst performance than biometricsystem without template protection. However, little has been done to behavioural modalitiesbecause they are considered complex and they are under changes along time. Thus, this thesisuses a behavioural modality due to its novelty and complexity (87, 93)

Cancelable methods usually require a biometric template and a user key in orderto protect a biometric template (46). Thus, the key influences the generation of protectedtemplate. In other words, the user key has an important role in the generation of protectedbiometric template. Few studies have evaluated the effect of the key in recognition performanceof biometric systems (46, 62). Therefore, the user key must be studied in order to understandits impact in the biometric system performance since the key is a user-specific informationand it is used to generate secure templates.

Although the use of cancelable biometrics bridges the gap between the convenienceof biometric authentication and security vulnerabilities, cancelable functions produce noisedata and depend on a user key to encode a biometric sample (72, 73). The use of cancelabletemplates tend to decrease the performance of biometric-based systems, since the inter varianceof cancelable templates is usually higher than unprotected templates. Therefore, powerfulrecognition methods need to be used to deal with protected template noise.

Classifiers have been used to categorize users using their biometric templates (46, 73).However, standard classification methods are not powerful enough to recognize protectedgenuine templates. Therefore, the fusion of information or decision has been used to improvethe recognition performance of biometric systems (36).

Although a number of methods exist to combine information sources, as described in(36), few studies systematically explore the effectiveness of multiple cancelable functions orprotection schemes in the context of information fusion. The literature on information fusionconsiders the following classes of systems:

1.1. Research Problem and Motivations 27

• multimodal system, which combines multiple biometric traits collected by differentsensors;

• multi-algorithmic system, which uses different algorithms on the same biometric moda-lity;

• multi-sample system, which combines several samples or instances of the same biometricmodality;

• multi-sensor system, which recognises a user based on one biometric modality throughthe combination of different sensors (of the same modality).

A special case of multi-algorithmic biometric system is to use several biometricfeature representations, leading to a multi-feature biometric system. Although some progresson biometric information fusion includes the combination of auxiliary, non-discriminatoryinformation such as user-specific characteristics and biometric sample quality (67), as wellas combining multiple biometrics in the context of template protection (74), the property ofusing multiple protection schemes is not often systematically evaluated, at least not in thecontext of information fusion.

The use of behaviour modalities has attracted some research efforts (93). However,template protection methods decrease the recognition performance of biometric system. Thus,new methods need to be proposed and implemented in order to keep a desirable performanceof biometric system using protected biometric templates, in particular, using behaviouralmodalities. In addition, the user key must be evaluated in order to understand its contributionto the security and performance of biometric system. These subjects are the main researchobjects of this thesis.

1.1 Research Problem and Motivations

Biometrics is a class of methods for identification and verification of users which usebehavioural and physical characteristics. Unfortunately, biometric systems may suffer somesecurity faults such as spoofing (79) and fraudulent access to enrolled biometric samples (72).Moreover, exploration of unprotected templates can result in total or partial user identity,consequently, a privacy breach. Thus, template protection methods should be applied in orderto protect personal information presented in biometric samples.

Cancelable methods require a user key to protect biometric samples. From now on, wecan call a user key used in cancelable methods as cancelable key as well. A cancelable key is


user-dependent and has strong impact in the template protection (49). Therefore, the key hasan important role in the performance and quality of cancelable methods.

Biometrics is a solution that does not require the use of passwords or devices in orderto verify or identify a user. However, the non-necessity of passwords or tokens/smartcards intemplate protection solutions are false. Therefore, new templates protection methods thatdo not require such passwords or devices need to be developed. Based on this, one of ourhypotheses is that a classifier trained using a specific user key is biased on the key insteadbiometric features. Consequently, a biometric system classifier has a high probability torecognize the key rather than biometric features.

The use of tokens, smartcards and passwords suffers problems related to inappropriatesharing or loss of device/password. Thus, in possession of such information, an impostor canincrease his success entrance rate. As consequence, token/smartcard solution exposes thebiometric system. Therefore, solutions which do not use tokens, smartcards and user keysmust be explored.

Few studies discuss how to store a cancelable function key. Among the reviewed studies,only (39) deals with key storage issues. The authors relate that the hash key can be storedin smart cards, tokens or inside the device in an encrypted format. In summary, discussionsabout key storage issues such as usability and security needs to be investigated.

Lack of security analysis of template transformation is present in most studies inbiometric literature. Most of the template protection studies assume that a protected templateis free of decoding and linkage attacks. Unfortunately, this is not true (46). Therefore, thevulnerability of a template transformation scheme to intrusion and linkage attacks should bereported (56).

Nevertheless, the use of cancelable methods in order to protect biometric samples arewell accepted in the community. However, it suffers some problems such as accuracy lossand key attacks. In order to improve performance and security, several solutions have beenproposed but not so fully described and discussed in one single document.

1.2 Thesis Statement

Based on the research problems and motivations discussed, we formulate the followingresearch hypothesis:

We can develop an authentication method which uses cancelable behavioural moda-lity able to outperform traditional biometric authentication methods. Furthermore,

1.3. Contributions 29

we can develop an empirical study of the impact of key storage and key length inthe performance of biometric systems.

The main objective of this thesis is to propose an authentication method which usescancelable biometric templates, in particular a behavioural modality. In addition, we expect tooutperform the existing methods when possible. Cancelable data and behavioural modalitieshave an inherent complexity, hence, we expect that the use of ensemble systems and multiplecancelable functions at the same time increases the performance of biometric systems thanthe use of standard methods, single classifiers. Indeed, another novelty presented in thisthesis is a study of the impact of the key length and storage methods in the performance andusability of cancelable methods. Biometrics fields is well researched but there is enough roomfor innovation, such our contribution in use multiple cancelable functions and the proposal touse a system key instead a different key for each enrolled user.

1.3 Contributions

During the literature review, it was observed that few studies analysed the use ofcancelable functions applied to a behavioural modality. Thus, the main contribution of thisthesis is to propose and analyse a new authentication method which uses a behaviouralmodality and able to keep or outperform the performance of standard methods (which useunprotected templates). Moreover, as specific contributions of this thesis regarding the use ofcancelable functions, we can cite:

1. Use of single classifiers. We show in Chapter 5, Section 5.3, that single classifiershave been used to classify genuine and impostors users using cancelable templates.Unfortunately, single classifiers do not show reasonable performance leading to the useof powerful methods such as ensemble systems.

2. Use of Ensemble Systems. We show in Chapter 5, Section 5.4, that ensemble systemshave been used with the same objectives of contribution 1. Ensemble system overcome theaccuracy achieved by single classifiers. Thus, the use of multiple classifiers is suggestedrather single classifiers.

3. Use of multiple cancelable functions. We propose in the Chapter 6 the use ofmultiple cancelable functions applied to the same biometric modality, in particular, abehavioural modality. We show that the use of multiple cancelable versions of a usertemplate increases the security and the performance of the classifier. Furthermore, in


this proposal we use the classifier score to fuse the different outputs instead decisionfusion. The decision fusion was performed in Chapter 5.

4. Systematic study about key storage issues. The storage of the required key isrelevant because the user has to remember or carry a token that contains the key. Ingeneral, these characteristics would defeat the purpose of biometrics as a convenientmechanism that cannot be lost or forgotten. Specifically, we consider the scenario ofstoring the key of a cancelable function in a consumer-grade mobile device which is aswell as a biometric authentication device. This scenario has widespread use (85) but abest practice has not been discussed by any empirical study. The discussion and theproposed solution to key storage issues and key attacks are present in the Chapter 7.

5. Systematic study about the use of homogeneous and heterogeneous key.Commonly each user adopt its own key to protect his data (46, 49). However, to thebest of our knowledge, there is no study that evaluates the idea to use a single key for allusers in a cancelable method. Thus, this thesis evaluates the performance of cancelablefunctions using a single and different keys for all users.

6. Systematic study about key knowledge attack. The key required by cancelablemethods can be shared or leaked to biometric system attackers. Consequently, impostorsare able to increase the rate of false acceptance. In Chapter 7, Section 7.3 we proposethat the best solution is to use cancelable methods and to store the key in an unprotectedformat. The key does not affect the user privacy in non-invertible functions because itis hard in a feasible time to decode the protected template. Unfortunately, this solutionallows non-authorized users to use this key to authenticate himself as a different user.Therefore, the biometric system that uses cancelable functions which uses unprotectedkey needs to be powerful enough to identify biometric features of an impostor.

7. Definition of usability and intrusion threats metrics based on classifier. Abiometric system which uses cancelable biometrics suffers some security vulnerabilitiessuch as increase of false rejection errors and linkage attacks. Hence, this thesis analysisthe user verification performance as well as defines some metrics related to usabilityand intrusion threats in order to evaluate the biometric system vulnerabilities.

8. Systematic Study of Key Length. The key length affects the performance of cance-lable functions. Papers which analysis BioHashing (42) and BioConvolving (47) showedthat the length of the key affects the key positively and negatively respectively. However,no study has been done in relation of Interpolation and Double Sum. Thus, we observethe relation of the length of the key with the accuracy of Interpolation and Double Sum.

1.4. Published Papers 31

We conclude that the key length has not any direct relation with the accuracy becauseits size does not increase the accuracy of user verification using the analysed biometricmodality.

9. Implication of our findings in mobile authentication. Based on our findings,we suggest that authentication system of mobile should use touch screen interactionas authentication information. In addition, we propose that the system protects thegenerated template with cancelable functions and store the used single key in anunprotected format. This proposal, specifically, is based on the results achieved ensemblesystems using homogeneous key under known key attacks.

1.4 Published Papers

The study published (15) provides the first developed insights about the behaviouralmodality (touch screen interaction dataset), cancelable functions and classifiers. In thementioned study, we show that it is possible to use cancelable behavioural modality. However,there is room for further development. The Section 5.3 presents the used methodology andthe achieved results.

Subsequently, we decided to invest in ensemble methods in order to analyse ways toimprove the achieved results. As a result, the second study was written to report the resultsand discussion achieved by ensemble methods (17). The study proposes the use of differentdecision fusion rules to compare which rule is appropriate to the problem. Moreover, anextended version of this article was published in the Journal of Information Assurance &Security (16).

We observed that the ensemble methods outperformed single classifiers. Ensemblemethods use different classifiers in order to resolve the same problem. Thus, we decided to trythis technique using different cancelable versions of the same biometric template with ensemblesystem. We named this idea as Multi-Privacy Protection Schemes (MPPS). Multi-ProtectionSchemes is a combination of cancelable versions of the same dataset in order to increase theperformance of a biometric system (18). The published study reports the idea, methodologyand results of such approach. The Chapter 6 discusses MPPS in details.

At this moment, we just submitted a study which presents and discusses our proposalto resolve some issues related to the key required by cancelable functions. In specific, wecompare the performance using two different key storage solutions and the performanceof these solutions under key knowledge attacks, i.e., attacks performed by impostors who(un)known the key used by a genuine user. The Chapters 7 and 8 present our proposal and


the achieved results for the discussed problems.

1.5 Thesis OrganizationThe thesis is organized in 10 chapters and it is structured as follows.

Chapter 1 [Introduction] Presents the main used concepts, contributions of thisthesis and an overview of the published studies.

Chapter 2 [Biometric Recognition] Origins, motivation, basic concepts and stateof art of biometrics is presented.

Chapter 3 [Template Protection] Origins, basic concepts and state of art oftemplate protection, cancelable functions are presented.

Chapter 4 [Statistical Learning and Ensemble Methods] Origins, concepts ofthe used algorithms and ensemble methods are discussed in this chapter.

Chapter 5 [User Authentication Using a Cancelable Behavioural Modality:Single Classifiers and Ensemble Systems] The used dataset, methodology and achievedresults are presented and discussed in this chapter. The results of single classifiers and ensemblemethods are compared in order to obtain an answer of which approach is indicated to thefaced problem.

Chapter 6 [Multi-Privacy Protection Schemes] This chapter presents the con-cept, methodology and results of multiple cancelable functions applied to the same modality.We named this approach as Multi-Privacy Protection Schemes.

Chapter 7 [Impact of Key Storage in Cancelable Functions] An investigationof the impact of the key storage is analysed in this chapter. We compare the performanceusing a different or a unique key for all users of a biometric system. A further investigationis done emulating an attack when an impostor knows or does not know the key used bya genuine user. Discussions based on the achieved results are done and solutions for suchproblems are proposed.

Chapter 8 [Impact of Key Length in Cancelable Functions] An investigationof the impact of the key length is analysed in this chapter. We compare the performance ofthe analysed cancelable functions using a different key lengths. Similarly of Chapter 7, weemulate an attack when an impostor knows or does not know the key used by a genuine user.Discussions based on the achieved results are done.

Chapter 9 [Conclusions] Contributions, limitations, opportunities for future work,and the thesis conclusions.

33

2 Biometric Based Recognition

A different number of systems require the correct identification of an authorized userto access the correspondent features of system. Unfortunately, authentication methods presentdifferent problems related to security and usability. Biometrics is one of used technologieswhich minimize such problems.

Biometric recognition refers to the automatic recognition of individuals using biologicalcharacteristics. Biological characteristics are categorized by physical and behavioural features.Thus, biometric recognition is based on the possession instead of knowledge e.g. an individualpresents his fingerprint (possession) instead of a password (knowledge). In this chapter wewill summarize the main concepts, strength, limitation of biometrics.

2.1 BiometricsIt is fundamental to humans to use biological characteristics to recognize other humans

or animals (37). The biologic recognition is a crucial skill inherent to human being. Due to itssuccess, system developers want to emulate this ability in computers.

A biological characteristic (physical or behavioural) is considered a biometric if itsatisfies such properties:

• Universality: all individuals should have this biological characteristic;

• Distinctiveness: two people should be different according to this characteristic;

• Permanence: biological properties should have low variance over time;

• Collectability: biological characteristic must be measured, preferably in an easy way.

Unfortunately such requirements are not fully satisfied. For example, fingerprint modalitydoes not satisfy the permanence and universality requirement because it suffers damagesalong time and some individuals have biological problems where fingerprint ridges are notevident.

A biometric system should present the following properties:

• Performance: refers to accuracy and speed of the system and its influenced factors;

• Acceptability: refers to usability metrics which measure the daily people acceptance;

34 Capítulo 2. Biometric Based Recognition

• Circumvention: refers how secure the biometric system is against fraudulent methods.

Due to biometric characteristics, the biometric system is used in identification andauthentication solutions. An identification solutions compares an unknown biometric sampleagainst a biometric database. In contrast, an authentication service compare a claimedbiometric sample against an identified enrolled sample present in a biometric database. Inshort, an identification method compares a biometric sample against all enrolled samples (1:ncomparisons) and an authentication method compares the claimed sample against only oneenrolled one (1:1 comparison).

In summary, a biometric system must use biometric characteristics which satisfybiometric requirements. In addition, it should be precise, fast, with good acceptance by theusers and secure against deceitful techniques.

Biometrics has been used in different types of applications. It is possible to classify itsusage in three fields: (1) Commercial: application login, ATM; (2) Government: national IDcard, passport, border control and (3) Forensic: body recognition, missing people, criminalidentification. Therefore, Biometrics has been used with success in different fields and theimprovement on biometrics presents potential benefits in people’s life.

2.1.1 Biometric Modalities

Physical Biometrics represents the biological characteristics known as “human parts”.Example of physical biometrics are fingerprint, iris, hand, blood pressure. These modalitiesare well known by the researchers and used in industry (9). Due to its large use it is knownas classical modalities.

In contrast, behavioural modalities use behaviour patterns in order to recognize anindividual. Examples of such modality are gait, signature, key typing, computer usage andtouch screen strokes (93). Such biometrics is not so well used due to high variance amongsessions and collectability challenges. Consequently, behavioural modalities have high potentialuse and open problems in literature.

There are a number of biometric modalities with strengths and limitations. Any singlebiometric modality can satisfy all the biometric and application requirements. A biometricmodality is chosen from different number of properties such as ease of collection, infrastructurecosts and discrimination power. Therefore, that is one of the reasons of why some biometricmodalities have been adopted by the industry: fingerprint, face and voice recognition. On theother hand, some modalities have been adopted by high accurate application such as bodyidentification and fraud detection.

2.2. Biometric System 35

2.2 Biometric SystemA biometric system is a pattern recognition software capable to match enrolled

biometric samples against claimed user samples. In other words, a biometric system classifyor identify a genuine user based on features present in a biometric sample. In this section wepresent and explain the main modules present in a biometric system.

A biometric system is composed mainly by four modules: acquisition, feature extraction,matcher and database (37).

1. Acquisition: collects biometric samples in order to transformed them in processeddigitalized format. The biometric samples are collected using sensors. The sensors canbe present in the same local of biometric system or remotely.

2. Feature extraction module: responsible to extract the features of raw biometric samples.The extracted features can satisfy the performance and security requirements. Examplesof extracted features are position, velocity and angle. In addition, the feature extractionmodule transforms the extracted features in a format compatible to the system, knownas biometric template. Usually, the acquisition and feature extraction modules areimplemented inside sensors.

3. Matcher: responsible to compare and decide if a claimed biometric template belongs toan enrolled user. The system decision is based on the similarity score between the queryand enrolled templates.

4. Biometric database: store and manage the enrolled biometric templates. Normally aquality step is performed before to store a biometric template in the database. Thisstep are recommended to assure the quality of the stored biometric templates. Usually,multiple user templates are stored along the time in the database. This several numberof templates reflects the variation suffered by the biometric sample along the time.

Each module is responsible for a crucial role of a biometric system and all the describedmodules is required for its proper work.

2.2.1 Verification and Identification Task

The Figure 1 illustrates how a biometric system works in verification and identificationtask. A verification problem can be formally described as a comparison problem of an enrolledand query template xj and x′j, for xj, x′j ∈ Xj, j ∈ {1, . . . , J} ≡ J users and Xj are all thepossibles templates of user j. Thus, a verification task outputs a score s = match(xj, x′j) |


0 ≤ match(xj, x′j) ≤ 1. The match function measures the similarity between xj and x′j . Thus,the score s is termed as similarity or matching score. If s ≥ τ, τ ∈ R, where τ is predefinedthreshold, the biometric templates xj and x′j belongs to the same user, otherwise the querytemplate x′j does not belongs to the user j. The upper part of the Figure 1 illustrates theverification process.

An identification problem is formally similar to the verification problem. The secondpart of Figure 1 illustrates the identification task. Instead of compare a query template x′

against only a xj , the identification task compares x′ against all the enrolled templates xJstoredin the database. Thus, the query template x′ belong to the user maxj(match(x′, x′j) ≥ τ),otherwise refutes the presence of an enrolled user compatible with the query template. Thus,an identification task performs J comparisons. In contrast, a verification problem performsonly one comparison.

Figura 1 – Identification and Verification Task using Biometric System Modules. Source: (37)

2.2.2 Biometric System Metrics

The matching score s = match(x′j, xj) measures how similar the query template x′jand enrolled template xj are. Thus, higher the score, more confident the system is that thetwo templates belongs to the same user. The distribution of scores generated by the pairs of

2.3. Advantages and Limitation of Biometrics 37

samples from the same person is called genuine distribution and from different users is calledimpostor distribution.

A biometric system generates two different type of errors:

1. two templates from different persons are classified as from the same individual (falsematch);

2. two templates from the same individual are classified as a different user (false non-match).

These errors are also called False Acceptance Rate (FAR) and False Rejection Rate (FRR).

Due to the threshold τ , there is a trade-off between FAR and FRR. If τ is small thesystem performs more FAR. In contrast, if τ is big the system is more restricted, i.e, thesystem has a higher FRR. Usually, the biometric system performance is exhibited using aROC plot. A ROC plot shows the FAR and FRR values for different τ values.

Published results use minimum FAR and FRR. However, FAR and FRR are not criticalenough measures because are application dependent. Thus, most of the studies relate thevalue when FAR and FRR are the same. This point is recognized as Equal Error Rate (EER).EER is application independent and provides a realistic metric to compare new results topublished ones.

The definition of a threshold value of a biometric system is application dependent.Some applications require low values of false acceptance. One example of these application isan ATM machine. A great amount of false acceptance generates a huge loss of money buta large amount of false rejection just upset a loyal customer. In contrast, some applicationsrequires a low number of false rejection. For example, a criminal investigation that wants torecognize potential suspects of a crime. In this case, it is important do not miss any suspectbecause missing a true suspect can damage the police work. Thus, it is important to set anappropriate threshold depending on the application characteristics.

2.3 Advantages and Limitation of BiometricsKnowledge and token based methods have disadvantages that Biometrics do not

present. Knowledge based methods such as user name/password can be forgotten, shared orpresent usability problems when the passwords are long or complex. In addition, passwordsdo not guarantee that the owner is providing the password. The same problem happenswith token/card based methods. These devices can be shared, forgotten or replicated. Theseproblems generate security flaws that can damage the whole application.


Biometric cannot be lost, shared or forgotten. In addition, on-line biometric systemsrequire users to be present in the authentication point. Thus, problems found in knowledgeand token based methods are not present in biometric solutions.

Biometrics systems present some limitation such as inconvenience in some scenarios.Suppose a biometric identification system which operates with a FAR=0.001. In a scenariosuch as criminal identification applied in an airport, for each 200,000 people, it is generated200 false alarms. Therefore in some scenarios the lack of accuracy in the biometric field presentproblems such as user inconvenience and privacy disruption.

2.4 Multibiometric SystemMultibiometric System (MBS) proposes to use different templates in order to identify

or authenticate an individual (36). A MBS can be categorized in different ways based on(1) the use of different modalities, (2) modality instances, (3) different samples and (4)different matching and feature extraction algorithms. Thus, MBS is able to resolve somelimitation of unimodal biometric system. Multibiometric System has been used in differentapplications (29, 58, 80).

The use of different modalities or samples minimize the unimodal limitation. Unfortu-nately, biometric modalities do not full satisfy the biometric requirements such as universality,distinctiveness, permanence and collection. For example, the use of fingerprint and iris mi-nimize the permanence problem related to fingerprint and the collection problem presentsin iris sampling. The use of MBS resolve some limitation and increase the performance andsecurity of biometric system.

Multibiometric System increase the performance due to use of several source ofinformation provided by the different biometric modalities. In addition, MBS can improve thesecurity because it is hard for example to produce different number of spoofed samples at thesame time. Moreover, MBS can implement multiple challenge-response method.

Biometric multiple challenge-response method allows a biometric system to ask differentmodalities in real time in order to authenticate an individual. For example, an ATM canask a user to provide specific fingers and iris. Thus, a MBS can increase the security levelproviding tools against spoofing and replication attacks.

2.4.1 Multibiometric Factors

A series of factors should be evaluated in order to implement a Multibiometric System.The involved factors are related to the number of modalities, the mode of operation, how the

2.4. Multibiometric System 39

information will be integrated, and the trade-off between accuracy and costs.

The number and the type of biometric modalities should be driven by the nature ofapplication, financial and computational costs related and the relation among the modalities.It is important to highlight that there is a direct relation between the number of modalities andsystem cost. For example, in an automatic gate which use voice, iris and gait, the biometricsystem must provide three different sensors, microphone, iris scanner and video camera,respectively. Consequently, financial and computational costs increase with the number ofmodalities.

A multimodal biometric system can operate in three modes: serial, parallel andhierarchical mode (59). Serial mode process a new biometric modality only after the completeprocess of previous modality. Mainly, this mode is used for filter identities using a fastmodality and in next modalities which provides more distinctness. On the other hand, parallelmode receives all the biometric templates at the same time. Serial mode presents a smallerinteraction time in an authentication task because in this mode a biometric system doesnot need to use all the modalities in order to verify a genuine user. In contrast, a parallelform requires all the modalities. Finally, in hierarchical mode, the modalities and matchingalgorithms are organized in a tree format.

A multimodal biometric system can be categorized due to modalities integration. Thepossible biometric integration are:

1. Sensor: different information originated from different sensors are combined;

2. Biometric Modalities: different biometric modalities are combined. For example, in asmartphone, an authentication method can combines the face and the voice of phoneowner (52);

3. Multiple Sample: different sample of the same biometric is provided to the system. Forexample, fingerprint samples of different fingers or different face pictures with differentlight conditions;

4. Multiple Representation and Algorithms: use of different representation (biometricfeatures) of the same biometric sample or different matching functions/algorithms (41).

Multimodal Biometric System can use three levels to fusion the provided templates (13).Some authors call this as decision level as well. The decision levels are:

1. Feature Fusion: the data provided by the different biometric samples are fused in afeature set format. Usually this fusion method aggregates the features in a single vector.


However, this fusion level faces some challenges because the features must be in thesame format, which is difficult in some applications.

2. Score Fusion: the final decision of the biometric system is calculated using the matchingscore of different matching algorithms. Each matching algorithm receives a templateand the output score is used in the fusion calculation. Some fusion rules are used suchas: min, max, min-max, median, sum-rule.

3. Decision Fusion: this decision level synthesizes the final decision from individual decisionsfrom independent biometric system.

2.5 Chapter ConclusionBiometrics is an important field for identification and verification of users. It was

discussed that a biometric modality and a biometric system should satisfy some properties.Unfortunately, there is not an optimal biometric modality and biometric system which fullysatisfy all the properties. Therefore, there are some challenges and limitations to be explored.

Some limitations of unimodal biometric systems are resolved by multibiometric system.Multibiometric system proposes to use different biometric “views” of an individual. Thus, theuse of different sample/modalities increase the security and the accuracy of the biometricsystem. However, the increase of biometric modalities increase financial and computationalcosts and user inconvenience.

One important point is to protect the biometric template against improper use. Non-authorized access of biometric templates generates privacy issues such as access to personalinformation. Furthermore, it is not possible to replace an unprotected biometric templatebecause individuals only have a unique biometric sample. Thus, a biometric advantage becomesa security limitation.

Template protection methods propose to resolve security and privacy issues related tobiometric template. The next chapter will discuss the concept, method types, advantages andlimitations of template protection technique.

One of the contributions of this thesis is to study the impact of template protectionmethods in unimodal and multi biometric systems. We aim to observe the system metricsamong the use of only one biometric modality, different protected versions of biometrictemplate (multibiometric proposal) and multi matching algorithm. Chapters 5 and 6 discussin details our observations.

41

3 Template Protection

3.1 Introduction

A biometric template is a digital representation of a biometric sample. Consequently,a template contains sensitive user information such as health and personal information. Dueto the strong link between individuals and biometric traits, the exposure of enrolled templatescompromise the system security and user’s privacy (61). Furthermore, a biometric trait isunique, consequently, it is hard to change when compromised. Therefore, a biometric templatein its original format is not an ideal design because it exposes the biometric system andusers. In addition, it will be almost impossible to use again the same biometric trait due itsuniqueness.

A biometric template has some vulnerabilities (38) such as: (1) a template can bereplaced by an impostor’s template; (2) a physical spoof can be created from a stored template;(3) a stolen template can be replayed. Thus, biometric systems must implement templateprotection methods in order to minimize these system vulnerabilities and user privacy.

Template protection is a set of methods specialized in preserving the privacy andbiometric characteristics of users (75). Protected templates should not provide sensitive userinformation such as full or partial user identity, sensitive user information or allow crossmatch between biometric systems. Therefore, template protection methods do not providesensitive biometric data to attackers.

A template database which implements a template protection method stores onlyprotected templates rather than templates in its original form (72). Furthermore, a templatedatabase does not store personal information but information generated from biometric traits.Thus, the adoption of template protection avoid vulnerability points related to biometrictemplate and template database. Therefore, in the case of biometric system compromise,user’s information is protected due to storage of only protected templates.

According to (38), an ideal protected template presents the following four properties:

1. Diversity: a protected template does not allow cross match of users among biometricsystems;

2. Revocability: if a template has been compromised, it is possible to revoke it for a newone based on the same biometric sample;

42 Capítulo 3. Template Protection

3. Security: it is computationally hard to obtain original biometric features from a protectedtemplate.

4. Performance: the accuracy of biometric system should not decrease due to protectedtemplate.

Unfortunately, until 2008, all protection methods present in the literature do not provide allthe properties (38).

In most operational biometric systems, a biometric template is protected using encryp-tion techniques (61, 64). However, encryption techniques show three main drawbacks: (1) thebiometric template is secure until the encryption key is secure; (2) for each authentication, asecure template needs to be decrypted in order to match a query template. Thus, in somemoment, the unprotected template will be available; (3) the match process cannot happenin encrypted domain because encryption methods are distortion sensitive. Therefore, newprotection methods such as feature transformation and cryptosystems have been developed.

Template protection schemes can be classified in two different categories: (i) featuretransformation (56) and (ii) biometric cryptosystem (57). Feature transformation schemestransforms or intentionally distorts the original biometric samples to protect the user’s data.Hence, only the distorted data is stored and used in the biometric system. In contrast,biometric cryptosystems propose securing a cryptographic key using biometric features orgenerating a cryptographic key straight from the biometric features (38).

Unfortunately, template protection methods decrease the performance of biometricsystems (27). The authors achieved higher equal error rate in protected templates comparedwith unprotected version. In addition, six algorithms are able to achieve an equal error rate(EER) inferior to 0.3% using unprotected templates of FingerPrint Verification Competition(FVC-STD-1.0). However using protected templates, the lowest EER was 1.54% (9). Therefore,powerful template protection methods which provides information security and comparableunprotected database performance need to be developed.

In summary, biometric template protection methods assure that the user sensitiveinformation is protected. We will present in the next two sections the cryptosystems andfeature transformation methods in details In addition, we will present the cancelable methodsused in the experiments performed in this thesis.

3.2. Biometric Cryptosystems 43

3.2 Biometric Cryptosystems

Biometric cryptosystems are methods which use stored public information, know as,helper data. The helper data is used to extract the cryptograpic key kj from a query biometricx′j during the matching process. However, this helper data cannot reveal relevant informationabout biometric template. Thus, cryptosystems generate and store a dependent informationcalled helper data.

Valid cryptosystems were successful applied to some modalities such as fingerprint (45),face (90), iris (91) and multibiometrics (60). However, such studies present low accuracy ratesand some limitation compared to the use of unprotected templates. Therefore, even withthese problems, cryptosystem literature has been explored in order to overcome them.

Biometric cryptosystem methods are mainly classified in two categories: key-bindingand key generation schemes. Key-binding methods generates a helper data through a keybinding process. In contrast, key generation schemes, the helper data is generated from theoriginal biometric system. Consequently, the cryptographic key is generated from the helperdata.

Unfortunately, cryptosystem schemes present usability and security problems. Keybinding does not allow revocability and diversity. Moreover, key generation schemes do notproduce keys with high stability and entropy. Consequently, cryptosystem methods do notsatisfy different properties that should be present in optimal protected templates. Therefore,due to the related problems, we focus our thesis in another template protection category:feature transformation methods.

3.3 Feature Transformation

Feature transformation is an important class of template protection methods deployedto protect biometric template. These methods use a function f to protect a biometric templatexj using a specific user key kj, thus, f(xj, kj) generates a protected template for a userj ∈ {1, . . . , n}. The key kj is generated randomly or through a user password. Due tocharacteristics of function f , f(xj, kj) 6= f(xj, kp), where kj 6= kp.

As previously discussed, a transformed template f(xj, kj) is stored in the templatedatabase and used instead of the original template xj. Therefore, a biometric system whichuses a feature transformed approach accepts a query template x′j if classify(f(x′j, kj), θj) > τ

using a trained user classifier classify(. . . , θj) in the transformed space f(x′j, kj), where θj isthe training parameters of user j in a specific classifier.


Feature transformation methods are classified as salting or non-invertible methods.The function f is invertible in salting feature transformation schemes (38), i.e., an at-tacker can generate the original template xj from a stolen protected template f(xj, kj):f−1(f(xj, kj), kj) = xj = since kj is known and f(xj, kj) ∈ Xj. Therefore, the user key kjmust be protected in salting methods. In contrast, non-invertible transformation function isgenerally a one-way function, i.e, it is hard to invert a transformed template f(xj, kj) even aknown key kj : xj 6= f−1(f(xj, kj), kj). Thus, an attacker cannot reconstruct the biometricsample or discover a full or partial identity of the user in a feasible computational time evenin possession of user key kj. In other words, in the case of a non-invertible function, if auser-specific key is compromised, the template is still secure. Therefore, feature transformationtechniques using non-invertible functions is ideal to protect biometric data.

Cryptosystems and feature transformation methods have their own strengths andlimitations. The main challenge of cryptosystems is develop methods which decrease linkabilitygenerated by key generation schemes. On the other hand, the feature transformation techniqueschallenge is to find a method which provides noninvertibility and it is tolerant to intra-uservariation. A way to resolve these limitations is to use hybrid solutions (24). For example, itis possible to use a hybrid solution to use secure a feature transformation template using acryptosystem method. However, hybrid solution is not the focus of this thesis.

Cancelable methods are template protection methods that in case of security faults theprotected templates can be cancelled (revoked). Thus, it is possible to generate a new protectedtemplate changing the key value kj . From now on, we will use the term cancelable biometricsfor salting and non-invertible functions which satisfy this property. In fact, this thesis usesnon-invertible functions due to its fundamental contributions that overcome limitations presentin cryptosystem methods.

3.4 Cancelable BiometricsBiometric features are user-specific and it is hard to be replaced in case of being stolen.

In other words, biometric characteristics are unique by user and compromised biometricfeatures are sometimes impossible to change or to adapt. Therefore, a template must berevocable.

Cancelable templates are obtained from the use of a user parameter (key) kj incancelable function f(xj, kj). If another k′j is used another different template is generated,i.e. f(xj, kj) 6= f(xj, k′j). Thus, it is possible to generate different templates using differentkeys. This characteristic allows that a biometric template can be revoked and two differenttemplates f(xj, kj), f(xj, k′j) should not be matched.

3.5. Cancelable Functions 45

Usually, the security problems faced by biometric system are template replacementand template replay attacks (7). Hence, biometric templates must be stored in a protectedway using a protection scheme that provides all the template properties: (1) Diversity; (2)Revocability; (3) Security and (4) Performance (38). Unfortunately it is difficult to define atemplate protection method that can satisfy all these properties.

The main limitation of cancelable functions is the trade-off between discriminability andsecurity. Unfortunately, cancelable functions decrease the discriminability due to generatednoise (75). Consequently, powerful matching methods and cancelable functions must bedeveloped to balance this trade-off. Thus, a feature transformation function should keep thediscriminability of biometric features and it should be hard to obtain the original from atransformed template.

A way to keep the discriminability and security of protected templates is to usedifferent cancelable biometric systems at same time. Unfortunately, the use multi-algorithms(cancelable methods) requires the presentation of different key to each cancelable method. Asa positive consequence, the user needs to carry or remember several keys which generates aninconvenient system interaction.

This thesis shows that it is possible to increase accuracy of biometric systems maintai-ning security. Thus, a Multi-Privacy Protection Scheme proposed by us (18) presents this inte-raction problem because an individual needs to present n different keys kj ∈ Rm | j = 1, . . . , nand m ∈ N to each fc cancelable function. In view of this fact, Chapter 7 evaluates the use ofa unique key in multi-algorithm system. As a consequence, the individual does not need toremember several keys because the key can be stored inside of the biometric system or in asmart card.

The next section presents the used cancelable functions. The section focus in how thetransformation function works, its strengths and limitation.

3.5 Cancelable Functions

Cancelable functions transform biometric templates in a protected format. It is com-putationally hard to obtain the original format from a cancelable template (38). In addition,it is possible to generate new cancelable templates from the same biometric sample changinguser specific parameters.

Unfortunately, high variance and noise are unintended consequences present in can-celable templates. Consequently, user verification using cancelable templates becomes moredifficult than unprotected templates. Thus, verification and authentication systems must pro-


vide better recognition performance than systems using non-modified data due to complexitypresent in cancelable biometric samples.

In this section, we analyse four different cancelable functions: Interpolation, BioHashing,BioConvolving and Double Sum. These cancelable functions have been chosen due its perfor-mance, simplicity and non-invertible characteristics. The cited cancelable methods can beapplied to physical and behavioural modalities (7). Indeed, our previous studies (15, 16, 17, 18)show that it is possible to apply these cancelable functions to a behavioural modality usingdifferent settings.

3.5.1 Interpolation

This technique consists in generating a new biometric model by extracting functionpoints resultant of attributes interpolation process. The interpolation process is based onpolynomial interpolation and the cited attributes compose the original biometric model xj.Protected templates M is generated through keys kj applied in the interpolated polynomialM = fI(kj). Thus, the main idea of interpolation method is to achieve an interpolatedpolynomial fI using an unprotected template xj. Consequently, based on the dependabilitykey, a new cancelable template is generated through use of kj.

Although simple and fast, the inversion of the polynomial function is difficult. As aconsequence, it generates a good security level. Therefore, interpolation satisfy two out offour feature transformation requirements: revocability and security. Revocability because itis possible to cancel an enrolled template replacing it for a new template using a new keyk2j 6= kj and security because it is difficult to decode a cancelable template.

The following steps describe how interpolation method works:

1. Given the original biometric template xj ∈ Rn for user j, where n is the numberof attributes of xj. A function fI(xj) ∈ Rn is obtained through interpolation of theattributes of biometric template. Thus, it is created one function fI(x) for each user.Therefore, in order to approximate the function to the discrete data, it is importantto use a polynomial function with a significant degree g, usually given by the greaterdegree supported by the system;

2. Within range of the function domain xj ∈ Rn, a vector of random numbers kj ∈ Rd isgenerated to all biometric data. kj has uniformly distributed pseudo-random numbers,where d is the dimension. The number of coefficients d is given empirically and caninterfere in the behaviour of the model (usually d = n);


3. The random vector kj is individually applied to function fI , generating as outputthe transformed model M . The revocability of this model depends exclusively on thecreation of a new random vector, in case the current vector is corrupted or stolen. Theinterpolation technique can be adapted to any biometric modality, since they providea biometric feature vector. Also, the interpolation model depends on some variables,such as the size d (step 2) of the random vector and the degree g of the polynomialinterpolator.

Interpolation method has some limitations. First, the interpolated polynomial cannotfit the biometric template xj perfectly. Thus, noise generated by the interpolation step isadded to the protected template. Second, the interpolation step take some time to protect agroup of users. Thus, it is ideal that interpolation method is applied during the enrollmentprocess.

3.5.2 BioHashing

BioHashing is a two-factor authenticator based on inner products between tokeni-zed pseudo-random key kj and the user specific biometric sample xj (46). This algorithmis characterized by transforming original biometric xj into an invertible binary sequencefB(xj, kj). This invertible binary sequence is generated by an inner product between theoriginal biometric vector and each pseudo-random orthonormal vector oj ∈ Rn | j = 1, ...,mobtained using the Gram-Schmidt algorithm on the key kj.

BioHashing technique has been used in other physical modalities, such as fingerprint,palm and face (84). However, this thesis uses this technique in a behavioural modality. Weuse the original BioHashing algorithm, that it is invertible, but we suggest that in futurework the adaptation proposed in (7) should be used. This adaptation proposes the use ofn different thresholds τj | j = 1, ..., n and the use of genetic algorithm to find the optimalthresholds τj. The original algorithm works according to the following steps:

1. Given the original biometric model xj ∈ Rn, with n being the number of attributes ofxj, a set of pseudo-random vectors kj ∈ Rn | j = 1, ...,m is generated, where m is thenumber of vectors of dimension n and m ≤ n;

2. The Gram-Schmidt algorithm is applied in kj ∈ Rn | j = 1, ...,m to obtain m orthonor-mal vectors oj ∈ Rn | j = 1, ...,m;

3. The inner product between the original biometric vector xj and each pseudo-randomorthonormal vector oi is calculated, 〈xj | oj〉 | j = 1, ...,m, where 〈• | •〉 indicates theinner product operation.


4. It is then created a m-bit Biohashing model through a binary discretization of the valuesobtained in the inner products, b = bj | j = 1, ...,m, where:

bj =

1, if 〈xj | oj〉 ≤ τ .

0, if 〈xj | oj〉 > τ.(3.1)

with τ being an empirically determined threshold. In this work we chose τ = 0.5.

The performance of Biohashing function is exclusively dependent on the variable m,i.e. the number of pseudo-random orthonormal vectors. The number of vectors is determinedempirically and corresponds to the number of attributes of each transformed template.Normally, m = n (39).

3.5.3 BioConvolving

BioConvolving was originally proposed for signature. This method generates theprotected template through linear combinations of sub-parts of the original biometric templatexj (47). Basically, BioConvolving uses a randomly transformation key kj to divide eachoriginal biometric sequences into W non-overlapping segments. Thus, the protected templatefBC(xj, kj) is generated through a linear convolution among the W obtained segments. Ageneral description of BioConvolving is presented as follows:

1. Define the number (W ) of original biometric segments.

2. Select randomly (W − 1) of values dw randomly selected between 1 and 99, wheredn > dn−1, i.e., ranked in an increasing order. The selected values are arranged in avector d = [d0, . . . , dW ], having kept d0 = 0 and dW = 100. The vector d represents thekey of BioConvolving transformation.

3. Convert the values dw into bw based on the following function bw = round( dw

100 ∗ n), w =0, . . . ,W , where n is the number of attributes and round represents the nearest integer;

4. Divide the original sequence xj ∈ Rn, into W segments xj(w) of length Nw = bw − bw−1

and which lies in the interval [bw−1, bw] of attributes;

5. Apply the linear convolution of the functions f(xj(q)), q = 1, . . . ,W to obtain thetransformed function fBC = xj(1)∗, . . . , xj(W ).

These steps are illustrated in Figure 2.


Figura 2 – Illustration of BioConvolving procedure using W = 3. Source: (47)

Due to the convolution operation in step (5), the length of the transformed functionsis equal to K = N − W + 1, resulting almost the same of original data. A final signalnormalization is applied, oriented to obtain transformed functions with zero mean and unitstandard deviation. Different transformed results can be obtained from the same originalfunctions, simply varying the size of segments W or the values of the parameter key k.BioConvolving security approach is based on the fact that a blind deconvolution problem isneeded in order to retrieve the original template (47). A blind deconvolution is not possibleto achieve in a feasible time. Consequently, it is not possible to retrieve the original templatexj if multiple transformed templates f(xj, kj) are stolen.

3.5.4 Double Sum

Double Sum method consists in summing the attributes values of original biometricsample xj with two other attributes of the same sample (66). Thus, the protected biometricsample is generated from the sum of three attributes randomly chosen by a random keykj. For different kj different protected templates f(xj, kj) can be generated from a specificunprotected template xj. Double Sum method is described as follows:

1. For each sample of original biometric template xj ∈ Rn, where n is the number ofattributes of xj , we generate a random vector L ∈ Rn based on a random seed or L canbe defined by user j. This random vector is responsible for distributing the attributesof original template xj. In other words, the original biometric model is re-organized inan intermediate biometric trait β based on the random vector L.

2. Define two vectors C1(j), C2(j) ∈ Rn, where c1(j), c2(j) ≤ n ∈ N∗ are randomly elementsof C1(j) and C2(j) respectively. This random choice is made using two security keys k1

j

and k2j for each vector C∗(j). The vectors C1(j) and C2(j) are composed by the position

of the attributes. The generation of two sets of random values doubles the security ofthe cancelable transformation. It is necessary two deconvolution process in order to


obtain the original template xj. Whereas this deconvolution requires a solution of adouble system with n variables;

3. The attributes of the reorganized biometric template β are summed with the attri-butes whose positions are defined by the random vectors C1(j) and C2(j). Thus, thetransformed template is obtained according to the following equation.

f(xj, kj) = β(j) + xj(c1(j)) + xj(c2(j)). (3.2)

where kj = {L,C1(j), C2(j)} is the parameters required to generate the protectedtemplate.

Double Sum method is considered non-invertible because the number of possiblecombinations used in the sum of attributes values is very high. As illustration, if a biometrictemplate xj has n attributes, there will be n!3 possibilities of generating the encrypted content.Therefore, it is computationally infeasible to reverse the process in a feasible time for n greaterthan 50 (66). Indeed, the dimension of the protected template (n = 30) and the key is thesame in this thesis.

The key kj is responsible for the biometric attribute selection. The possibility to changethe key kj in case of template exposure assure the template revocability. Therefore, a newprotected template can be created using a different security key k′j.

3.6 Function of Key in Cancelable FunctionsAs discussed before, cancelable functions require a user specific key kj to generate

a protected template f(xj, kj). Depending on cancelable function, a key has the functionto generate new feature values (Interpolation, BioHashing) or to select biometric features(Double Sum, BioConvolving). Therefore, the key affects directly the generation of protectedtemplate.

Usually, all the authors of cancelable functions assume that each user has a differentkey kj . Thus, we suppose that the classifier (matcher) is biased in the key instead of biometricinformation. In order to evaluate this learning bias, we analyse in this thesis the use of aunique key ks for all users. Our hypotheses is that the protected template generated with asystem key ks provides more biometric information than a protected template using a specificuser key kj . Consequently, the classifier would be biased in the biometric features rather thankey.

The use of a unique system key can decrease the inter-user variability. Since ourproposal is to use a unique key for all users, the generated protected template among users

3.6. Function of Key in Cancelable Functions 51

can be more similar than standard design (use of a different kj for each user). However, webelieve that the information present in the key supposedly increase the inter-user variability.Thus, it influences the discrimination of cancelable templates among users. As consequence,the rate of false rejection (FRRT ) and acceptance (FART ) in transformed domain possiblywill increase. Therefore, our proposal to use a unique system key ks needs to deal with thislimitation and provides methods to overcome it.

We suppose that the use of a unique key will not affect the intra-user variability. Inthe case of heterogeneous key, the use of different keys for each user, all protected templatesof user j use the same key kj . Consequently, in our proposal to use a homogeneous key ks, thesame happens to all protected templates of user j. Therefore, in our proposal, homogeneouskey, the intra-user variability remains the same because the actual key ks does not change ina specific user j.

In this section we discussed the role of key in cancelable functions. We discussed that auser key has an important role in selection and generation of protected template. In addition,we observed that the key affects the inter-user and intra-user variability. Consequently, thekey affects the False Acceptance Rate and False Rejection Rate. In the next subsections, wewill discuss how the literature proposes the storage, generation and length of the key. Theseaspects are important in a protected template because directly affects the performance andthe security of the biometric system.

3.6.1 Key Storage

Biometrics is spread in the literature as a solution that cannot be shared and anindividual needs to be present in order to be authenticated. In addition, in a biometric system,a user does not need to remember passwords or carry token/smart cards. The biometricsample has the “password/token” function. In other words, the biometric sample proves tothe system that the user is genuine due to his unique biological characteristics.

The tokens and smartcard solutions allows the user to share or lose their passwords.Thus, an impostor can increase his success entrance rate in possession of the token and aspoofed biometric sample. As consequence, token/smartcard exposes the biometric system.

The use of cancelable functions promotes the use of a key. The advantage of thenon-necessity of passwords or carry tokens/smartcards present in biometric authentication isfalse, i.e., cancelable biometrics requires a key to its proper use. Therefore, these biometriccompetences need to be reclaimed.

Few studies discuss how to store a cancelable function key. Among the reviewed studies,only one discusses key storage issues (39). The authors relate that the hash key (BioHashing


key) can be stored in smart cards, tokens or inside the device in an encrypted format. Insummary, we want to discuss key storage issues such as usability and security. In addition,propose some solutions to limitations present in key storage solution.

Biometric features cannot protect a cancelable function key. Biometric features areaffected by different variables such as environment, light or sensor problems (38). Thus, theidea of storing an encrypted version of the key is not good because a biometric template isnot the same in different conditions. Therefore, the use of biometric samples as a storage keysolution is not feasible.

One possible solution is to use a network model to store the cancelable functionkey. However, this solution is not perfect because computer network has some inherentlimitation such as denial of service attacks, connection problems or costs related to thenetwork infrastructure. Thus, a storage model can be appropriate but counter-measures needto be implemented.

The storage solution proposed by us is to store the key in an unprotected format insidethe device. As discussed before, the key does not affect the user privacy in non-invertiblefunctions because it is hard in a feasible time to decode the protected template. Unfortunately,to store the key in an unprotected format allows that non-authorized users access the key.Consequently, the unprotected key can be used by attackers to authenticate themselves as adifferent user. Therefore, the biometric system that uses cancelable function with an availableunprotected key needs to be powerful enough to identify the impostor biometric features.

The local key storage solution increases usability. The local key storage allows thatthe system users do not need to remember or carry passwords in separated devices to beauthenticated. As discussed in this section, the use of tokens/smartcards generates problemsrelated to key sharing or key loss. Therefore, the presence of the key inside the biometricsystem provides biometric benefits such as non-sharing secret or user stress. As a consequence,the usability of the system increases which impacts in biometric system adoption.

Based on the discussed topics, we suggest the use of cancelable functions with unpro-tected keys in mobile devices. This is one of the contributions of this thesis. This suggestionallows that user key may be stored in the device to avoid security and usability problems. Incase of compromised device, methods to destroy protected template and user key can be used.Therefore, a biometric system which uses our storage proposal is usable, secure and protectsthe user privacy.

3.6. Function of Key in Cancelable Functions 53

3.6.2 Key Generation

The key used in cancelable functions is randomly generated or provided by theuser. Unfortunately, our literature review has not identified any authors who discuss theconsequences and limitations present in key generation methods. Thus, this section proposesto discuss the potential consequences and limitations present in key generation methods.

The functions used in this section (Interpolation, BioHashing, BioConvolving andDouble Sum) are generated by random methods which generate pseudo-random vectors froma seed. However, random vectors based on seed is not truly random. Consequently, biometricsystem with high number of users can generate similar protected templates among these users,what can increase the rate of False Acceptance Error on these users. An example of a realbiometric system with a high number of users is a citizen registration system. Therefore, thekey generation based on random seed presents limitations that interfere on the biometricsystem performance.

A key provided by a user has some limitation which generates security flaws. Securityflaws such as similar keys for different users because users is not able to generate differentcomplex passwords that obey some cancelable function requirements. Thus, some biometricsystems can use similar keys for different users. Therefore, keys provided by users can impactnegatively the performance of biometric systems.

In this subsection, our focus is not to perform a deeper analysis of key generationschemes but to stimulate the reader to this problem. We observed in this brief discussion thatthe consequences of pseudo-random keys in biometric performance affects the performance.Thus, this subsection introduces the problem and suggests this topic to further work.

3.6.3 Key Length

As discussed in the subsections 3.6.1 and 3.6.2, the generation and the storage of thekey impacts the performance and usability of the biometric system respectively. However, thekey length also influences the biometric system performance.

Few studies have discussed this important key aspect. Thus, this thesis contributes witha brief discussion about it. In addition, Chapter 8 presents and discusses some experimentsusing different key lengths in the cancelable functions. In summary, we observed that the keylength impacts positively and negatively the biometric system performance. Indeed, the typeof impact is dependent of cancelable function.

BioConvolving authors showed that the performance decreases when the number ofW segments increases (49, 50). The authors explain this phenomenon based on different


sequences extracted from different signatures of the same user typically have different length.This characteristic causes an alignment problem. Therefore, segments with high varianceproduces convoluted outputs with high variance as well.

BioHashing is able to achieve 0% of EER for dimensions above 100 (42). The perfor-mance of BioHashing has a direct relation with the biometric sample dimension (42). In otherwords, the BioHashing performance is proportional to the number of template dimensions.However, the data with high dimensionality is more complex to classifiers. Another problemis that some biometric templates can present few dimensions that impacts negatively theperformance in case of BioHashing function. Therefore, in BioHashing approach, the dimensionof the biometric sample has relation with the key length.

Similarly, Lumini and Nanni (46) showed also that the performance of the BioHashingincreases when the key increases. Thus, the authors propose some ideas to improve BioHashingmethod to overcome the limitation of the dimension of the BioHashing. The authors statethat BioHashing (1) improves the system security increasing the dimension of the key and (2)BioHashing is an instable classifier.

Lumini and Nanni (46) deals with the BioHashing problems proposing a set ofsolutions: (1) Normalization: normalize the biometric feature before applying BioHashingprocedure; (2) Threshold Variation: use of various thresholds to binarize the BioHashing Code;(3) Space Augmentation: the BioHashing method is interacted k time on the same biometricsample in order to obtain k bit-vectors bi; (4) Feature Permutation: Use several q permutationsof the biometric sample in order to generate q bit vectors. The Space Augmentation andFeature Permutation uses an idea to fusion the matching scores using a SUM rule. Thus, theBioHashing procedure becomes computationally longer and complex.

Unfortunately, the literature does not present solutions to increase the length ofInterpolation and Double Sum (66). Therefore, one of our contributions is to propose and toevaluate new ways to increase the key for Interpolation and Double Sum methods.

The key length brings good (BioHashing) and bad (BioConvolving) consequences.Thus, the increase of the length of the key does not guarantee an performance increasing.Therefore, it is important to evaluate the key length in cancelable methods before apply thembecause the key influences the performance.

All the aspects of a user key need to be evaluated before to its use in a cancelablefunction. In addition, the aspects can be altered due to biometric modality. As discussed inthis section, the storage, generation and length impacts the performance and the usability of abiometric system. In addition, some cancelable functions have better performance for biometricsamples with high number of features (BioHashing) which means a long key. Therefore, the

3.7. Multi-Biometric Template Protection 55

aspects of the cancelable key is important as well as cancelable function or the biometricsample.

3.7 Multi-Biometric Template Protection

The study carried out by Rathgeb and Busch (74) presents a discussion about issuesand challenges of multi-biometric template protection. The authors summarise and discusssome characteristics of multi-biometric template protection studies, which are: (1) fusion ofbiometric traits in feature level, score level and decision level; (2) rearrangement of biometricrepresentation to provide a uniform distribution of error probabilities; (3) combination ofbiometric modalities increases security and accuracy; (4) use of different feature extractionmethods using one biometric modality; (5) use of multi-biometric protection system usingmultiple protected physical modalities. However, Rathgeb and Busch (74) do not discuss theuse of behavioural biometric in multi-biometric protection systems. We propose to fill this gapin this thesis. We investigate the use of multi-biometric template protection systems usingbehavioural modalities.

The performance of biometric systems using multiple physical biometric samples isinvestigated by Canuto et al. (7) . However, this study investigates the use of multiple physicalbiometric samples. The use of behavioural biometrics is more challenging than the use ofphysical because human behaviour is influenced by different factors (17).

The empirical experiments performed in Damasceno and Canuto (17) show that theuse of multi-algorithm approach using cancelable functions archives interesting results inprotected and unprotected biometric samples. Unfortunately, the authors do not show theadvantageous of using multiple samples.

The application of template protection in a multi-algorithm approach using a 3D-facemodality is discussed in Kelkboom et al. (41). The authors discuss the findings using differentfusion levels, feature-, score- and decision-level. The authors applied the rules AND and ORto fusion the decision of biometric system.

Multi-Biometric Template Protection shows a great potential to overcome templateprotection limitation. In this thesis we use the idea to use different cancelable functionsin order to generate different protected versions of the same biometric sample. Thus, it ispossible to provide different views of a biometric sample to an ensemble method. Chapter 6shows these experiments and propose to use an ensemble method using an majority vote. Weobserved better results than using a unique template protection.


3.8 ConclusionThis chapter introduces the template protection field. Template protection methods

propose to protect sensitive user data present in biometric templates. However, such methodsdecrease the biometric system performance or suffer some limitation such as low diversityand revocability.

Template protection methods are categorized in Cryptosystem and Feature Trans-formation. Cryptosystem methods use an auxiliary information namely helper data. Thishelper data is used to retrieve the cryptosystem key used in the matching process. Although,cryptosystems offers enough security, these methods offers low revocability and diversity. Fea-ture transformation methods overcome these limitations offering mechanisms which performsdistortions in biometric features. The feature transformation methods use a user specific keyto cancel a biometric sample. For this reason, some feature transformation methods are knownas cancelable methods.

Cancelable methods are a set of methods which generates a new protected biometrictemplate when user specific parameters are changed. These methods allows that compromisedsystems revoke protected templates in order to achieve a new secure enrollment process.Cancelable methods which uses non-invertible functions to generate protected templatesthat it is not possible to decode them in feasible time even in key possession. In thischapter, we discussed four cancelable methods used in this thesis (Interpolation, BioHashing,BioConvolving and Double Sum). These cancelable functions are non-invertible and wellapplied to physical modalities.

One of the contributions of this thesis is to discuss some points related to key functionin cancelable function. We saw that key in cancelable functions are user specific. The key canbe studied considering some characteristics. We propose to discuss the generation, storageand length of cancelable keys.

We found that the characteristics of key affects performance, diversity and usabilityof biometric system. In addition, we discuss how the key storage affects the usability of abiometric system. After the discussion, we propose the use of non-invertible functions in abiometric system with user specific key stored in the device in an unprotected format. Weproposes this solution based on the security of non-invertible functions and the recognitionpower of the biometric matcher. In addition, the presence of unprotected key in the deviceincrease the usability because an individual does not need to inform it, i.e., the key is alreadyaccessible to the biometric system.

The key generation based on random vectors influences the key diversity of a biometricsystem. Thus, in a high used biometric system, some users can have similar keys which

3.8. Conclusion 57

influences the False Acceptance Rate.

The key length also impacts the biometric system performance. We observed thatsome studies observed that the key length affects positively and negatively the performanceof the system. However, a deep analysis of causes of this behaviour has not been studied.We propose to explore this gap in this thesis analysing the performance of Interpolation,BioHashing, BioConvolving and Double Sum thought different key lengths. In addition, wewill try to identify what affect the biometric system performance considering different keylengths.

Lastly, we talked about the use of multi-biometric template protection in this chapter.Multi-biometric template protection aims to use different biometric samples or modalitiesto improves the security and performance of biometric system. In this thesis we propose inChapter 6 a new multi-biometric template which uses different biometric template methodsto increase the security and performance. This technique is based on idea of ensemble systems.We observed interesting results and such proposal overcome the results achieved using onlyone template protection method.

59

4 Statistical Learning and Ensemble Methods

This chapter introduces the main concepts of the classifiers and ensemble systemused in this thesis. The first section introduces the statistical learning. Statistical learning isthe basis of learning capability of the used classifiers. The second section presents the mainconcepts, advantages and limitation of the used classifiers (k-NN, Decision Tree, Naive Bayes,Artificial Neural Networks and Support Vector Machines). Lastly, we discuss the main relatedconcepts of ensemble system. In addition, we present some related application of ensemblesystem in the biometric field.

4.1 Statistical Learning

Daily, human beings face several problems when we have to predict a behaviour orclassify unseen objects based on past experience. Consequently, it is desired to automatize thistask in order to avoid human errors. On way to automatize identification and classificationis to use Statistical Learning. Statistical Learning uses statistical capabilities present in anobserved data Z to assign labels to unseen objects (43). Thus, statistical learning methodspredict unseen instances based on statistical metrics of presented objects. Consequently, astatistical learning method produces a relation between observed and unseen objects.

An object or instance x is composed by features or attributes that describe each object.Normally, objects are distinctive due the combination of attribute values.

Statistical learning formally produces a function y = f(x) to label x or to determine ascalar value y to an unseen instance x (86). To understand how to determine the functionf(x) is necessary to introduce some elements.

1. Random vectors x, generated independently from an unknown distribution P (x);

2. An output vector y, given by a supervisor, for each vector x, according to an unknownconditional distribution P (x|y);

3. A learning machine algorithm which implements a set of functions f(x, α), α ∈ Λ.

Thus, a statistical learning problem can be generalized to a function selection f(x, α), α ∈ Λ,where f(x, α) is the best function to predict the supervisor’s response y. The selection ofthe function f(x, α) is based on a training test set t = {(x1, y1), . . . , (xk, yk)} composed by k

60 Capítulo 4. Statistical Learning and Ensemble Methods

observations generated according to P (x, y) = P (x)P (y|x). From now one, the term classifierhas the same meaning of function f(x, α).

A discrepancy measure L(y, f(x, α)) is calculated in order to measure the approxima-tion of the supervisor’s response. The function L(y, f(x, α)) measures the discrepancy of thesupervisor’s response y and the value calculated by f(x, α). Thus, an expected value of theloss, given by risk functional can be defined:

R(α) =∫L(y, f(x, α))dP (x, y) (4.1)

The objective is to find a function f(x, α) that minimize the expected value of loss R(α).

The statistical learning problem is categorized by the presence of y: (1) supervised and(2) unsupervised (31). A supervised problem predicts y to unseen instance x using a labelleddataset (training dataset). Thus, a training set of a supervised learning problem presents thevalue of y, i.e, ts = {(x1, y1), . . . , (xk, yk)}.

On the contrary, unsupervised learning problems use unlabelled data, i.e, the trainingset does not present y. Hence, an unsupervised training set is composed by only unlabelled iinstances x, tus = {x1, . . . , xi}. Unsupervised problems are known as exploratory data analysisbecause these unsupervised problems try to understand the behaviour and organization ofthe training set.

Supervised problems can be categorized in (1) regression problems and (2) classificationproblems. The supervised problems are categorized due the nature of label y. In regressionproblems y ∈ R is a scalar value . Examples of regression problems are stock price, concentra-tion of blood cells in a sample. In contrast, y belongs to W = {w1, . . . , wc} in classificationproblems. In other words, classification problems are composed by problems which stablish adiscrete value y called category. Examples of classification problems are handwritten digitrecognition, user authentication and email classification (spam or non-spam).

4.1.1 Evaluation of Classifiers

The performance of f(x, α) is evaluated through three distinctive steps: (1) training,(2) validation and (3) test steps. Normally, the set used in training, validation and test stepshas 70%, 10% and 20% of objects of Z, respectively (43). The first step, training step, selectsthe best f(x, α) based on a set of observed objects (training set). Thus, the training phaseconsists to select the best function f(x, α), α ∈ Λ that it has the best accuracy to label thetraining set.

Secondly, a validation step consists in to present a validation set v composed byinstances not present in training set t, i.e, an independent set. In a validation step, the

4.2. Classification Methods 61

training process continues until the performance in validation set v matches training stepperformance. Validation step acts like a parameter tuner to functions based on parameters.

Finally, the test step proposes to verify the true performance of the function f(x, α), α ∈Λ using a test set ts composed by unseen objects. The test set measures the generalizationpower of function f(x, α). The test step accuracy is based on unseen objects only. Thus, atrained function f(x, α) with a good generalization power have similar performance on seenand unseen objects. Therefore, a test step shows the true accuracy and generalization powerof f(x, α).

The training/validation/test methodology avoids the overtraining. An over trainedclassifier perfectly learns the available data and fails on unseen data. This is undesired becausea classifier needs to present a similar accuracy in unseen objects as well. The overtrainingproblem is as known as an overfitting problem.

Different evaluation strategies have been proposed in literature in order to evaluatethe function f(x, α). We briefly highlight the most used.

• Hold out: Split Z into halves and it use one half for training and the another half fortesting. Splitting in other proportions is used, for example, 70% for training and 30%for testing.

• Cross Validation: Z is divided in k parts of size N/k. Then, the classifier is trainedusing the k− 1 subsets and tested with the remained subset. This procedure is repeatedk times, choosing a different subset k for testing. The final accuracy is the average kestimates. The cross validation is called leave-one-out when k = N .

• Bootstrap: This evaluation method generates L sets with cardinality N with replacementfrom Z. Thus, the bootstrap accuracy is the average accuracy of classifier on these Lsets.

In this thesis, we use a 10-fold cross validation. In a 10-fold cross validation, Z isdivided in 10 independent subsets. The achieved error rates is calculated using the averagerate of classifier on these 10 subsets. In the next section we will describe the used classifiers.

4.2 Classification MethodsThis thesis reports results of machine learning algorithms in order to resolve a biometric

authentication problem. For this reason, we use different classifiers in order to select a set ofappropriate classifiers to this task.


We used four different classifiers in this thesis. This four different classifiers have beenchosen due its learning paradigms: distance, search, probabilistic and optimization based (22).Thus, we aim to analyse the results of the main learning paradigms in user verificationproblem.

4.2.1 k-NN

Some classifiers are based on the hypothesis that similar objects tend to be in a specificregion (22). On the contrary, objects of different classes are concentrated in different regions.Thus, classifiers such as k-NN use the distance among objects to label unseen instances assimilarity hypothesis.

The k-NN method classifies a new object based on the k-Nearest Neighbours aroundthis object (43). Thus, the majority class of the k-Nearest Neighbours is given to the newobject.

The similarity of an unseen object for the k-Nearest Neighbours is determined by adistance. Normally, the euclidean distance (given by Equation (4.2)) is used. Where xi andxj are two objects in Rd and l is the index of attribute of object x.

d(xi, xj) =

√√√√ d∑l=1

(xli − xlj)2 (4.2)

Therefore, the class specified to a new object is based on the majority class of k-NearestNeighbours determined by a distance d.

k-NN is compatible to classification and regression problems. k-NN can determinediscrete values because its prediction is based on the k nearest neighbours. For regressionproblems, Mitchell (51) affirms that k-NN is a collection of local approximations with lowcomplexity. Thus, k-NN can create local approximation of objective functions.

Unfortunately, the k-NN method is a storage consumer, sensible to noise, distance andnumber of neighbours. k-NN method needs to store all the training data in order to definethe class of unseen objects. Thus, for big datasets, k-NN consumes a lot of space in memoryto its proper working. In other words, as a distance based classifier, the noise present insome objects interfere in the classifier performance. Moreover, the selection of k is applicationdependent. Consequently, k is also a parameter value that needs to be tuned. See (22, 43, 51)for more information about distance metric based classifiers, specially k-NN.


4.2.2 Decision Tree

A machine learning problem can be formulated as a search based problem. A searchbased problem proposes to evaluate a set of possible solutions represented in a specific languageusing an evaluation function. Thus, a search based problem aims to find the best hypothesisspecified by a specific language in a hypothesis space. Decision tree algorithm is an example ofa search-based algorithm. Decision tree uses the given data to search for the best hypothesiscapable of generalize the knowledge present in the data.

A decision tree uses a strategy known as divide to conquer. A divide to conquerstrategy divides a complex problem into simpler problems, where the solution of simplerproblems is easily calculated. Recursively the strategy is applied until all the solutions havebeen found. In the end, the solution of a complex problem is composed by the fusion of resultsof simpler problems.

A decision tree is a knowledge model composed by nodes, branches and leafs. Anexample of a decision tree is illustrated in Figure 3. Formally, a decision tree is a directedacyclic graph where each node is a division node, with two or more children, or it is a decisionnode (leaf). In a decision tree, intermediate nodes tests a feature xi, for example xi > 10, inorder to achieve a leaf node. A leaf node assumes one of the wi values, i.e., a leaf is labelledby a classification label wi or a scalar value. Finally, a branch is a decision route along nodesto a leaf, i.e, a decision rule. For example, x1 > 10 ∧ x2 = 25, then wi = ‘car’.

Figura 3 – An example of a decision tree. Source: (2)

There are different algorithms to generate a decision tree, among them the mostrecognized are: ID3 (70), ASSISTANT (10), CART (5), C4.5 (71). Usually the algorithms


used to generate decision tree uses a heuristics based on a further step. This heuristics neverreconsider a taken decision (hill climbing). Such type of heuristics suffers some limitation suchas only find optimal local solution. However, such heuristics allows to construct decision treein linear time.

Decision tree presents some limitation such as sensitivity to noise and the possibilityto generate trees with deeper levels. Usually, algorithms that use data with noise constructsdecision trees with low performance. Another limitation of standard generation tree algorithmsis to create trees with a large number of levels. Consequently, the analysis and interpretationof such tree is complex. One way to avoid this limitation is prune it.

To prune a decision tree is to replace deep level nodes for leafs. Thus, a tree with nlevels decreases to n− k levels. To prune a tree offers advantages such as better generalizationpower and tree interpretation.

Pruning methods can be classified as: (1) Pre-Pruning and (2) Post-Pruning. In pre-pruning methods the tree generation stops when some criterion is archived. On the otherhand, post pruning methods cuts the tree after it was constructed. Post pruning methodsallows to find a tree with a higher generation power than an originally constructed. For moreinformation about pruning methods see (21).

Decision Trees present some advantages such as:

• Flexibility: Decision trees are non-parametric methods, i.e., it does not assume any datadistribution and parametric values. Thus, a decision tree can cover an entire space ofinstances through its exhaustive search.

• Attribute Selection: Root node and its nearest nodes represents the attributes thatbest represents the problem solution. Thus, a decision tree can be used as an attributeselection method because it can be used to extract a minimum number of the bestinfluential attributes.

• Interpretation: Each branch of a decision tree can be interpreted as a decision rule.Decision rules are easy to understand because they represent cause-consequence formulalargely used by human beings.

Further explanation, application and limitations about decision trees can be found in (22, 30,81)


4.2.3 Naive-Bayes

Probabilistic methods assume that the probability of an event A given an event Bdoes not depend only of relation between A and B, but also depends of the probability toobserve A and B independently (51). Thus, the Bayes theorem defines a way to calculatethe probability P (wi|x) of an object x belongs to a class wi ∈ W = {1, . . . , ω}. The Bayestheorem is defined by Equation (4.3), where P (wi) is the a priori probability of the classwi, P (x|wi) is the probability of observation of the objects x that belongs to wi and P (x) isthe occurrence probability of all the objects x. The Bayes theorem is used to calculate the aposteriori probability of an event given his a priori probability and the new object.

P (wc|x) = P (x|wc)P (wc)P (x) (4.3)

The Naive-Bayes (NB) classifier uses the Bayes theorem to classify a new object.The NB classifier assigns the most likely class w = argmaxiP (wi|x) to a given example x.The learning process of a Naive-Bayes classifier assumes that the features are independentgiven class, i.e., P (wc|x) = P (wc)

∏dj=1 P (xj|wc). Naive Bayes has proven effective in different

applications such as text classification and medical diagnosis (51, 88). See (22, 51, 88) for moredetails, examples and further discussion about Bayes Theorem and Naive Bayes classifier.

4.2.4 Artificial Neural Network

Previously, we saw that some machine learning algorithms are based on distance(k-NN), search (Decision Tree) and probability (Naive-Bayes). The following subsections willintroduce two machine learning methods based on optimization: (1) Artificial Neural Networksand (2) Support Vector Machines.

A machine learning algorithm based on optimization tends to optimize an objectivefunction. An objective function is responsible to measure the learning error. Thus its op-timization ensures that the learning capability will be maximized based on the presenteddata.

Historically, human beings are inspired by nature in order to create methods to solvetheir problems. Thus, inspired by nature, researchers built a structure similar to a humanbrain capable to “ learn ”.This model is namely as Artificial Neural Network (ANN) (22, 32).

Artificial Neural Networks are structures composed by units known as artificial neu-rons.The neurons are heavily interconnected and they are arranged in layers where informationusually travels to one direction (left to right) only (illustrated in Figure 4). There are assignedweighted connections among neurons. Connections with positive weights are called excited


connections, while connections with negative weights are called inhibitory connections. Theseweights are the basis of ANNs learning.

Artificial Neural Networks produces a high computing power capable of model nonlinear functions. An ANN can model a non linear function with a good performance but itis necessary to pay attention to its structure and learning algorithm. The architecture ofan ANN is defined by type, number and organization of neurons. In addition, the learningalgorithm is responsible for updating the weights and the information processed by ANN.

A neuron contains multiple input and output terminals. Thus, each neuron input recei-ves only one entry (value). Subsequently, these neuron inputs are combined by a mathematicalfunction fa, i.e., a function fa calculates the neuron output value based on provided inputs.The function fa is known as activation functions. Different activation functions fa can befound in literature. Among them, we highlight linear, threshold and sigmoid functions. Formore information, see (32).

The neurons present in an ANN are arranged in one or more layers. This structure isnamely Multi-Layer Perceptron (MLP) (see Fig 4). In this structure of network, the neuronspresent in layer n receive as input the outputs of neurons from layer n − 1. Subsequently,the neurons of layer n provide its outputs to the next layer. The network shown in Figure 4contains three layers (input, hidden and output). The illustrated MLP processes ten attributesin the input layer and provides two answers in output layer. Learning algorithms and differentarchitectures are widely discussed in (22, 32, 88).

Figura 4 – Example of a Multi-Layer Perceptron


4.2.5 Support Vector Machine

Support Vector Machine - SVM (14) is a powerful tool that has attracted attention inliterature. Mainly due to performance achieved in classical problems compared to classicalalgorithms such as ANN or decision tree. The SVM is based on Statistical Learning Theory.Thus, it obeys all the assumptions of such theory. SVM can resolve linear (Linear SVM) andnon-linear (Non-Linear SVM) problems. Therefore, SVM is able to resolve different type oflearning problems (classification and regression).

Linear SVMs

The support vector machine emerged from the results achieved in statistical learningtheory and the idea to define margins to separate objects. Linear SVMs are able to solvelinearly separable problems because it uses linear boundaries to separate objects belonged totwo different classes. Thus, SVMs can be used instead classical linear problems such as linearregression.

Linear SVMs with rigid margins are able to separate objects of classes +1 and -1 by ahyperplane, since the training set X is linearly separable. This type of support vector machinehas this nomenclature (rigid margins) due to absence of training data between separationmargins.

The presence of noise or the nature of the problem diminish the performance of SVMswith rigid margins. Thus, SVMs with rigid margins can be extended to deal with commonproblems. To accomplish this, the obligation of objects do not be present between separationmargins is relaxed. i.e., some objects violates the rigid margins. To accomplish this, everyobject xi of the training set X has a respective relaxing variable ri. This relaxing variableallows that some objects are between the separation margins. Hence, the SVM performs somemisclassification. See (22) for more information.

Non Linear SVMs

In some cases a hyperplane does not divide perfectly a training set X. Even SVM withrelaxed margins cannot separate cases when the decision edge needs to be curve. Thus, SVMswith curved edges can resolve non linear problems.

Non Linear SVMs resolves non linear problems due to transformation performed inspace of training data X to a space with a bigger dimension than X. Let the mappingΦ : X → =, where X is the input space and the feature space = with a bigger dimension thanX. Thus, = is separable by a linear SVM. Two requirements must be taken into account to =becomes really linearly separable. The first requirement is that the transformation Φ must benon-linear and the second is that the dimension of the feature space = must be high.


The mapping Φ is computationally complex due to the possibility of the size of thespace = to become infinite (33). Thus, the use of kernel functions avoid this complexityproblem. A kernel function k receives two points xi and xj of the input space and it calculatesthe dot product of these points in space characteristics =.

The commonly used kernel functions are the polynomial, the radial basis (RBF) andthe sigmoid. The process to obtain a SVM classifier involves choosing the appropriate kernelfunction, setting its parameters and regularization constant C. The choice of function andits parameters directly affect the performance of the classifier since they define the decisionboundaries. Therefore, SVM is a powerful classifier but needs to be tuned correctly to provideits power.

Next section discuss the main aspects of ensemble system. We will introduce ensemblesystems, its structure and configuration, advantages and challenges.

4.3 Ensemble Systems

One way to combine the results of different classifiers in biometric data is through theuse of ensemble systems, also known as multi-classifier systems or fusion of experts. Ensemblesystems exploit the idea that different classifiers can offer complementary information aboutpatterns, thereby improving the effectiveness of overall recognition process (6, 43).

An ensemble structure is composed by a set of N individual classifiers (ICn), organizedin a parallel way. The Figure 5 presents a general structure of an ensemble system. Individualclassifiers receive the input patterns and transmit their output to a combination process thatis responsible to provide the final output of the recognition system.

Figura 5 – An illustration of the general framework of an ensemble system

4.3. Ensemble Systems 69

There are two main issues to consider when designing an ensemble system, which are:(1) ensemble components, and (2) combination methods. In relation to the first issue, ensemblemembers must be chosen carefully. The appropriate choice of a set of individual classifiers isfundamental to the overall performance of an ensemble system. The ideal situation would beto choose a set of base classifiers with uncorrelated errors. In this case, the base classifiersare combined to minimize the effect of these failures. In other words, individual classifiersshould be several among themselves. Depending on its particular structure, an ensemblecan be designed using two main structures: heterogeneous and homogeneous. Heterogeneousstructure combines different classification algorithms as individual classifiers. In contrast, thesecond approach combines classification algorithms of same type. In this thesis, we use bothensemble structures.

Once a set of individual classifiers has been created, the next step is to choose aneffective way to combine their outputs. According to (43), the possible ways of combining theoutputs of N classifiers in an ensemble depend on which information we obtain from eachindividual classifier (ICn). The combination of outputs diverge from the simplest approach,using class labels or rank values, to the use of elaborate information, such as support degreeD (43).

4.3.1 Learning Strategies in Ensemble Systems

The composition of a set of identical classifiers in an ensemble system does not incurany gain in terms of performance. Thus, it is important to emphasize that diversity plays animportant role in the design of accurate and efficient ensemble systems (43). The ideal situation,in terms of combining classifiers, would be a set of classifiers with uncorrelated errors (highdiversity). Diversity in ensemble systems can be reached by using different parameter settings,different training datasets and/or classifier types. For instance, an approach to promotediversity is through the use of learning strategies, also known as learning architecturesor simply architectures. The most common architectures are: Bagging (4), Boosting (26),Stacking (89) and Voting (19).

Bagging is based on idea of data resampling (4). Diversity is promoted in Bagging byusing bootstrapped replicas of the training dataset. Each replica is a subset of the trainingdata randomly generated, with replacement.

Boosting is very similar to Bagging since it also applies a resampling procedure.However, Boosting does not use a training dataset obtained by uniform random resampling.Instead, Boosting uses a probability distribution (weights), adaptively adjusted, assigned toeach pattern of the training set. The procedure increases the weight of patterns incorrectly


classified and decreases the weights of correctly classified patterns. Thus, the boostingalgorithm focus on building classifiers that are strong in the most difficult patterns.

Stacking is a strategy that uses a classifier (combiner) to combine the predictionsmade by others classifiers (base classifiers). In this way, the base classifiers are trained usingthe general training data and the combiner classifier is trained using the predictions of baseclassifiers. This kind of structure is much used when the base classifiers are heterogeneous.

Lastly, the Voting strategy is based on vote like in a democracy. The majority outputof the base classifiers is the final output, i.e, each base classifier presents its output and themajority output is the winner.

All the learning strategies used in this thesis use the identity label and classificationscore in the decision process. In other words, each learning component decides what type ofuser the claimed individual is (client or impostor) using identity label. On the contrary, theensemble systems can combine the classifier score of the ensemble components.

4.3.2 Ensemble Systems for Cancelable Biometrics

Several transformation functions have been reported for different biometric modalities,face (3), signature (48, 63), fingerprint (44, 55), iris (40), voice (92), among others. However,most of them use single classification/matching algorithms.

Ensemble systems can use cancelable biometrics and achieve very promising results (7,8). These studies show the accuracy of the biometric systems using cancelable biometricsample can be improved using an ensemble system. However, the authors use only physicalbiometric modalities. Unlike these studies, our thesis applies a multi-privacy scheme usingensemble systems with behavioural biometrics for user verification task.

4.4 Chapter ConclusionThis chapter aimed to introduce the main ideas and concepts of statistical learning. In

addition, we presented some classification models which use statistical learning concepts. Wepresented main concepts, advantages and disadvantage of k-NN, Decision Tree, Naive-Bayes,Artificial Neural Network and Support Vector Machine.

Ensemble Systems use a different number of individual classifiers to predict unseeninstances. Ensemble Systems explores the idea of complementary information given by eachindividual classifier. The literature argues that Ensemble Methods present better performancethan single classifiers. We focused in this chapter the use of Ensemble Systems in biometricsfield. Moreover, learning strategies and use of Ensemble Systems in biometrics were discussed.

71

5 User Authentication using a CancelableBehavioural Modality: Single Classifiers andEnsemble Systems

5.1 Touchalytics

Smartphones are present in a hugely interconnected society. Important informationabout individuals and companies is stored in such devices and its security is an importantissue (12). Thus, authentication methods have to allow the access to only authorized people.

The authentication process in smartphones is based in pin-numbers (number sequences)or combination of drawings. The main security problem of pin-numbers is its reduced length.Usually, pin-numbers are composed of 4 numbers. The combination of drawings is a useful andintuitive user interface for authentication process. Basically, a user connects graphic elementsto generate a path. Consequently, the path is used as password. Unfortunately this graphicpassword is vulnerable to shoulder surfing attacks (28) and have a limited length as well.Thus, authentication methods present in smartphones need to improve in order to protectuser data. We believe that the use of Biometrics is a way to improve authentication methods.

Biometrics have been used in smartphones successfully. Currently, several smartphonemodels present a fingerprint device such as Apple iPhone and Samsung Galaxy S6. However,smartphones are required to have fingerprint readers to be able to read fingerprints. Unfortu-nately, old models which do not have fingerprint readers cannot have authentication usingBiometrics. In addition, smartphones with fingerprint readers are expensive. Behaviouralbiometrics can be used instead of physical modalities in In order to overcome physical biome-trics limitations. Authentication systems in smartphones using behavioural biometrics do notneed to provide a dedicated sensor to collect biological patterns. Therefore, a authenticationmethod based on behavioural biometrics can be used in all smartphone models.

Touchalytics is a behavioural biometric dataset collected by Frank et al. (25). Thisdataset represents a combination of strokes collected on Android smartphones. The authorsoffer the dataset in raw and processed format. The raw dataset contains only the data providedby the android system such as event code, event time and phone orientation. Based on thesefeatures, the authors also provides a processed dataset with 30 features derived from the rawdataset. We use the processed dataset in this work.

72Capítulo 5. User Authentication using a Cancelable Behavioural Modality: Single Classifiers and Ensemble

Systems

The formatted dataset has 30 attributes derived from strokes performed by 41 users.Each instance in a dataset represents a user stroke. In total, the dataset contains 21158 strokesdivided in 12989 scrolling and 8469 horizontal strokes. Please see (25) to obtain more detailsabout information, correlation and visualization of features present in Touchalytics.

A stroke is encoded as a sequence of vectors sn = (xn, yn, tn, pn, An, ofn, ophn ), n ∈{1, 2, ..., N} with the location xn, yn, the time stamp tn, the pressure on the screen pn, thearea An occluded by the finger, the orientation ofn of the finger, and the orientation ophn of thephone (landscape or portrait). Therefore, the dataset has information about the area covered,stroke pressure, direction, velocity and acceleration of user’s strokes.

The strokes were captured in two phases. The first phase was performed after each userread three documents and to respond a little questionnaire after the end of each document.This phase is responded to record the scrolling strokes because the users were tended to useonly this stroke direction during the document reading. The second phase was performedin order to capture horizontal strokes. Thus, a game which asked to identify differences insimilar images was proposed to users. Each user resolved a second pair of images game aftertwo minutes of the first game round. Therefore, scrolling and horizontal strokes were collectedbased on this experimental protocol.

Touchalytics has 21158 instances divided in 12989 scrolling and 8469 horizontalinstances. This division was made by authors to verify some similarities in users directionpatterns (horizontal and vertical movements). All the 21158 instances where used in trainingand testing phase of the following usage scenarios.

In (25), the authors presented initial results using three different scenarios:

1. Inter Session: the goal is to authenticate users across multiple sessions performed in thesame day. Thus, only 11888 scrolling and 7204 horizontal instances were used in thisscenario.

2. Inter Week: the goal is to authenticate users in two different weeks (the period of timebetween these two sessions is one week). Thus, 11888 scrolling instances were used intraining phase and 801 instances were used in testing. It were used 7192 horizontalinstances for training and 1265 horizontal instances for testing.

3. Intra Session (Short Term Authentication): the last scenario is continuous authenticationwithin one session of using the device. After a few strokes, the device turns to classificationmode and maybe is possibly able to detect if another person takes the phone. All thedata is used in this scenario, time independently.

5.2. Methodology 73

We will use these three different scenarios in the experiments performed in the next section.

We are not analysing the problem as a continuous authentication. The focus of thisthesis is to evaluate the accuracy of classifiers using protected biometric templates. For thisreason, we simplify the problem as a usual classification problem using a 10 fold cross-validationprocess using all data, time independently. Thus, the adequate scenario for our experimentdesign is Intra Session. Therefore, our work follow partially the experiment performed by Franket al. (25).

In (25), the results are reported using Equal Error Rate (EER) metric. EER informswhen the false acceptance and the false rejection become equal. The EER can be obtainedfrom the ROC curve when the false positive rate (false acceptance) is equal to false rejectionrate (1− true positive rate).

Frank et al. (25) present the results using k-NN and SVM (Support Vector Machine)classifiers. The mean EER ranges from 0% to 4% across all sessions: Inter Session, Inter Weekand Intra Session. The mean EER in Intra Session are 0%. It seems that, within one session,most users do not considerably change their touch behaviour. The Inter Session EER reachesfrom 2% to 3% and the Inter Week EER reaches from 0% to 4%. This result indicates thebehavioural biometrics (touch data) has good perspectives in practical use. Therefore, wedecided to investigate how the same machine learning methods (kNN and SVM) and ensembleclassifiers behaved using cancelable data.

5.2 MethodologyThis section describes the methodology used in the experiments performed in this

thesis. The methodology used aims to evaluate a secure biometric system to user verificationusing different cancelable data. The methodology is composed of 4 main steps:

• Data Cleaning and Processing: This step aims to make the selection and normalizationof the features. In addiction, we balance, binarize and protect the biometric templates.This step is better explain in subsection 5.2.1.

• Training of Classifier: The cleaned and processed data is usually divided by trainingand test sets. In the training phase, the classifier is trained and it is able to generalizethe data from the training set. The test set is used in the next step.

• Evaluation of Classifier: This step aims to evaluate the generalization power of thetrained classifier using the test set. Thus, this step measures the true accuracy of trainedclassifier using unseen instances.


Systems

• Performance Visualization: The last step provides tools and plots able to visualize theclassifier performance over different decision threshold. In our case, we used tables andDET curves as performance tools.

The Figure 6 illustrates the organization of the methodology.

5.2.1 Data Preprocessing Module

The data pre-processing module is the same to all the experiments executed. Thedifference among the different experiments performed are the training and testing of theclassifier and visualization performance modules. This section is divided by experimentsreported in the chapters of this thesis.

Data Cleaning and Processing

Classifier Training

Classifier Evaluation

PerformanceVisualization

Figura 6 – Organization of the Methodology

The data preprocessing step is responsible to clean, balance, and to protect thebiometric data. The cleaning step removes non-informative features such as phone id and timepresent in the biometric data. After, we split and balance the dataset in binarized user-specificparts, i.e, a specific-dataset is created for each user. Strokes belonging to the correspondinguser (client) are considered positives (‘1’) and negatives (‘0’) to the other users (impostors).Thus, each dataset has positive samples (genuine samples) and negatives samples (belongs todifferent users not the genuine user).

The next step is to balance each specific user dataset. After cleaning and the creationof user specific dataset, we obtained 41 datasets, one for each user. However, each datasetscontains n1 > 0 positive samples and n2 × 41 negative samples. Definitely a set of 41unbalanced datasets. The balance step will let the user specific dataset with the same numberof positive (genuine) and negative (impostor) samples. In order to adjust for the unbalanceduser-specific training data sets, we resample the negative stroke samples where every samplehas a probability of Np

Nncof being selected, where Np is the number of positive samples and Nnc

is the number of negative samples. Therefore, by using the above resampling method, eachuser-specific classifier is trained with Np positive and negative samples, creating a balanced

5.2. Methodology 75

training set. Thus, our balanced user-specific dataset has the same number of positive andnegative instances.

Finally, we consider the biometric protection step as part of the data pre-processingmodule. We applied four cancelable transformation functions to create four correspondingprotected balanced user-specific datasets. The Figure 7 illustrates these 2 steps: (1) splittingand balancing the main dataset by the user and (2) applying the four cancelable functions.Therefore, the output of this data pre-processing module is 41 balanced user-specific datasetsprotected by cancelable functions without non-informative features.

The data pre-processing module is common to all the experiments performed in thisthesis, except key impact experiments. These experiments also impact the data protectionstep. Thus, except in Chapters 7 and 8, we do not relate to the data preprocessing step.The next step is to model the decision module after cancelable biometric samples have beengenerated for single and ensemble system experiments. In order words, we will present howthe modules (1) classifier training, (2) classifier evaluation and (3) performance evaluationwere designed.

Figura 7 – Generation of cancelable biometric samples for each user

5.2.2 Decision Module for Single Classifiers and Ensemble Systems

This section describes the training and evaluation of the used classifiers (single classifiersand ensemble systems). In addition, we describe the used performance visualization tools. Allthe experiments use the same data-preprocessing step, except the experiments will evaluatethe impact of the cancelable key (see Chapter 7 and 8).


Systems

5.2.2.1 Single Classifier

After the pre-processing module, the k-NN classifier using k=5 and SVM with polyno-mial kernel were used. The number of nearest neighbours has been chosen though 5 fold crossvalidation using a 13 > k > 1. The SVM parameters was tuned according to (34). Hsu et al.(34) propose to use RBF Kernel and 10 fold cross validation to find the best parameter C andγ. This proposal is widely used in literature and it is implemented in library LibSVM (11).In addition, the classification algorithms of this investigation were extracted from WEKA 1

package.

The empirical analysis uses the 10-fold cross-validation methodology. Thus, all resultsrefer to the mean over 10 different test sets. In addition, several experiments were executedto define parameter values used by supervised learning algorithms.

The results are evaluated using Equal Error Rate (EER). The EER is the point whenthe False Acceptance Rate (FAR) has the same value of the False Rejection Rate (FRR). TheMann-Whitney statistical test was applied to compare the results obtained using the differentcancelable functions against original dataset (unprotected biometric data). The reportedstatistical comparison uses a confidence level of 95%(α = 0.05), i.e, when bilateral p-value ishigher than 0.05 the samples can be considered similar.

Box plots are used in order to visualize the EER achieved among the 41 users foreach cancelable function. Thus, it is possible to compare visually some exploratory statisticalmeasures of the used cancelable functions.

5.2.2.2 Ensemble System

For Ensemble System Experiments, we decided to see how ensemble classifiers performin original and cancelable data. Thus, Bagging, Stacking and Voting were selected as ensemblelearning structures. The training and the testing of these learning structures follow its ownmethodology.

The experimental analysis uses three ensemble structures, in which two of them haveemerged from two learning strategies (Bagging and Stacking) and the third one is majorityvoting, a traditional combination method. For Bagging, we used two different ensemble sizes,six (Bagging_6_kNN, Bagging_6_SVM) and twelve individual classifiers (Bagging_12_kNN,Bagging_12_SVM).

Stacking with 6 components are used as a combination method for Stacking, whichare: kNN (Stacking_kSk) and Logistic Regression (Stacking_kSL). For voting, it was used 7

1 http:www.cs.waikato.ac.nz/ml/WEKA

5.3. Single Classifier Experiments 77

components. In Stacking and Voting, SVM and kNN are used in a half-by-half proportion. Forsimplicity, the average EER values is presented, considering the different learning strategies(Bagging_6, Bagging_12, Stacking and Voting).

The evaluation (10-fold cross validation) and the performance report (EER, MannWhitney and BoxPlot) are specified similarly to single classifier experiments. See previoussubsection 5.2.2.1 for details.

5.3 Single Classifier Experiments

This section presents the results from application of kNN and SVM classifiers, onOriginal and transformed datasets (four different transformation functions). See Section 3.5and Section 4.2 for more information about cancelable functions and classifiers respectively.

Tables 1 and 2 illustrate the average Equal Error Rate (EER) for five different datasets,Original (Orig.), Interpolation (Inter.), BioHashing (BioH.), BioConvolving (BioC.) andDouble Sum (DS). These tables are organized by classifier (kNN and SVM) and sessions (InterSession (IS), Inter Week (IW) and Intra Session (ITS)). The comparison between Originaland transformed dataset has been done using the Mann-Whitney statistical test.

Tables 1 and 2 also present the inferior and superior statistical results for scrolling andhorizontal strokes respectively. The shaded cells represent the results that are statisticallysimilar to the values obtained in Original dataset. While, the bold values mean that the EERobtained by the transformed dataset was statistically inferior than the Original value. In ourcase, inferior is good because we are presenting our results based on error (EER).

Tabela 1 – Average Equal Error Rate - Scrolling Strokes

Session Orig. Inter. BioH. BioC. DS

kNNIS 1.36± 3.28 42.26± 5.31 31.44± 12.56 2.57± 8.08 9.61± 5.36IW 1.14± 3.22 38.99± 7.41 32.48± 13.41 3.23± 11.11 9.72± 5.66ITS 1.36± 3.28 9.43± 5.56 33± 13.46 3.60± 10.89 9.08± 5.76

SVMIS 9.04± 8.01 41.86± 6.47 32.31± 21.61 1.85± 7.95 12.48± 9.27IW 9.04± 8.01 39.41± 39.03 29.84± 19.66 3.11± 10.88 12.54± 9.87ITS 9.04± 8.01 11.91± 9.57 29.08± 21.61 3.18± 10.80 11.80± 9.89

BioConvolving and DoubleSum have similar performance than Original data (Tables 1and 2). For instance, BioConvolving dataset has four statistically similar results and sixstatistically lower results compared to Original dataset. In a similar manner, Double Sumhas five statistically similar results, out of 12 possible cases. It is important to highlight


Systems

Tabela 2 – Average Equal Error Rate - Horizontal Strokes

Session Orig. Inter. BioH. BioC. DS

kNNIS 1.99± 3.61 42.26± 5.31 33.58± 11.38 0.258± 0.61 10.08± 5.41IW 2.07± 3.79 49.70± 3.86 35.19± 13.02 0.13± 0.31 9.47± 5.73ITS 1.99± 3.61 11.12± 6.64 34.10± 11.52 0.22± 0.77 9.87± 5.61

SVMIS 17.43± 12.37 41.86± 6.47 38.82± 12.11 0.64± 0.62 21.56± 13.46IW 17.43± 12.37 41.86± 6.47 41.02± 12.53 0.44± 0.46 22.45± 13.76ITS 17.43± 12.37 24.77± 13.81 41.43± 9.25 0.52± 0.66 22.26± 13.87

the BioConvolving results obtained using the horizontal strokes dataset. The use of thistransformation function caused statistically similar and lower results using kNN and SVM,respectively (Table 2).

Figures 8, 9 and 10 illustrate the boxplots for Inter Session, Inter Week and IntraSession experiments using kNN and SVM classifiers. The boxplots are organized by dataset(Original and cancelable) and classifier to compare the Equal Error Rate among users in eachdataset. Each figure shows two subfigures plotted by stroke direction (scrolling and horizontalstrokes).

The performance of the single classifier in Original results has more distortions thanthe BioConvolving dataset. We can see in Figures 8, 9 and 10 that the boxes of Originaldataset have more outliers (points in plot) than BioConvolving boxes. On the other hand, theBioHashing boxes show that the EER among users are very disperse. In other words, theirwhiskeys are too long and the boxes height are higher, when compared with other boxes.

All SVM boxes for the Original dataset provides lower EER than kNN (Figures 8, 9and 10). In addition, the SVM-BioConvolving box has few variations due to small distanceinterquartile. This result means that SVM-BioConvolving classifier is stable, unlike the resultachieved by SVM-BioHashing classifier. In addition, SVM results are more disperse than kNNresults because the SVM boxes have longer whiskeys than kNN boxes.

In the Intra Session scenario, the single classifiers (kNN and SVM) provide similarperformance to Original on Interpolation and DoubleSum datasets (see Figures 10a and 10b).Which means that Interpolation and Double Sum functions can be used to protect behaviouraldata instead of an unprotected data.

The Interpolation and BioHashing datasets present the worst results (high EERvalues) in Intra Session. The poor results obtained by Interpolation may be due to the usedinterpolation function that does not fit properly the Original data so well or due to therandom vector applied to the interpolated function that may be causing a major disruption in

5.4. Ensemble System Experiments 79

data, modifying the class separation. For the BioHashing dataset, the poor performance maybe due to the discretization step. The discretization step may has caused loss of importantinformation on the Original data (see 3.5.2). One possible solution is to set more discretizationintervals or applying optimization methods in order to find a set of optimal thresholds (7).

5.3.1 Contribution

In summary, the results demonstrate that BioConvolving and DoubleSum functionscan be used in authentication process using behavioural biometrics without deterioratingthe performance of the whole process from a statistical point of view. In the single classifierexperiments, we achieved interesting results for BioConvolving and Double Sum datasetsusing kNN and SVM classifiers. In BioConvolving, the mean EER varies between 0.22% and3.60% and the mean Double Sum EER varies between 9.08% and 22.45%.

A statistical test was applied and showed that BioConvolving and Double Sum providesimilar or better EER values than EER achieved using the Original dataset. The SVM classifierhas statistically lower EER results in BioConvolving and similar statistically results in DoubleSum, when compared with the Original dataset. Additionally, the kNN classifier has similarstatistical results in BioConvolving using Inter Session scenario for both stroke direction.

On the other hand, Interpolation and BioHashing functions need to be exploredin order to decrease the Equal Error Rate (ERR) of these datasets. For instance, someimprovements can focus on parameter optimization and/or multi-modal biometrics processing.These improvements might become Interpolation and BioHashing a promising alternative.

5.4 Ensemble System ExperimentsThe use of ensemble systems has been defined to extend the single classifier experiments.

The main reason is to observe if it is possible to achieve lower EER with the use of ensemblesystems. Thus, an empirical analysis is conducted in order to validate the use of ensemblesystems using cancelable behavioural biometrics. For simplicity, we use only Intra Sessionexperiment in this investigation. This scenario is the most suitable to apply elaborated patternrecognition structures as ensemble systems.

The main goal of this experiment is to compare the EER values achieved by ensemblesystems using cancelable data to EER values achieved in single classifier. To illustrate thecomparison, it is presented boxplots to each dataset. For simplicity, just the best result byensemble structure is presented in Figure 11.

Figure 11 illustrates the obtained results for scrolling and horizontal datasets. The


Systems

best ensemble system for each cancelable dataset is on y-axis of this figure and the EERvalues is on x-axis. Figure 11a shows the performance achieved in the scrolling dataset, whileFigure 11b displays the results achieved using horizontal strokes dataset.

Stacking and Bagging systems are the best ensemble systems in Original and cancelabledatasets (Figure 11). While Voting does not present good results compared to other ensemblestructures.

Stacking is the ensemble structure that offers better performance in cancelable beha-vioural biometric data. This ensemble structure achieved the lowest EER values in 6 cases,out of 10 (5 cases each for scrolling and horizontal strokes). BioConvolving is the cancelablefunction that provided the lowest EER values. Its interquartiles are very close to the mean,see Figure 11b.

On the other hand, BioHashing presents the worst result in both datasets (scrollingand horizontal). Once again, the discretization step is one of possible reasons for this behaviourof the BioHashing dataset (see 3.5.2). Similar results were discussed in previous subsection,subsection 5.3.

Now we investigate whether the use ensemble structures cause a positive impact onthe performance of the different biometric datasets (Original and transformed datasets). Weused all the learning strategies with same number of learning components.

Tables 3 and 4 present a comparative analysis of ensemble structures, in comparisonwith the single classifiers. These tables display the number of times an ensemble systemachieves lower EER than two single classifiers (kNN and SVM). Number 0 means that anyensemble structure does not presents a lower EER than single classifier. 1K means thatthe ensemble structure achieved lower EER than kNN. In the same way, 1S represents theensemble structure that provided lower EER than SVM. Finally, number 2 means that theensemble structure provided lower EER than both single classifiers (kNN and SVM). Thelast line of Tables 3 and 4 represents the number of times in which the ensemble structuresprovided lower EER values than any single classifier, for each cancelable function.

The ensemble structures provided lower EER at least for one single classifier (Tables 3and 4). Specially, in SVM use. Thus, it can be concluded that all the ensemble systemsprovided lower EER than SVM classifier in the Original dataset. This is an important resultbecause it shows that the ensemble system provides better generalization power compared tosingle classifier.

Ensemble systems achieve lower EER than both single classifiers in BioHashing andBioConvolving, except for Bagging_6 and Bagging_12 on scrolling data. In these cases, theensemble structures provide lower EER then SVM classifier only.


Tabela 3 – Comparative Analysis of Ensemble structures and Single Classifiers for Scrollingdataset

Ensemble Original Interpolation BioHashing BioConvolving DoubleSum

Bagging_6 1S 1S 1K 2 1SBagging_12 1S 1S 1K 2 1S

Stacking_kSk 1S 1S 2 2 1SStacking_kSL 1S 2 2 2 2

Voting 1S 1S 2 2 1S

Total 5 6 8 10 6

Tabela 4 – Comparative Analysis of Ensemble structures and Single Classifiers for Horizontaldataset

Ensemble Original Interpolation BioHashing BioConvolving DoubleSum

Bagging_6 1S 1S 2 1S 1SBagging_12 1S 1S 2 2 1S

Stacking_kSk 1S 1S 2 2 1SStacking_kSL 1S 2 2 2 2

Voting 1S 1S 2 2 1S

Total 5 6 10 9 6

In both stroke directions (Tables 3 and 4), ensemble systems achieve lower EER thansingle classifier. Original, Interpolation and DoubleSum datasets have the same number oftimes ( 5 times for Original, and 6 times for Interpolation and Double Sum). In addition,BioConvolving provided lower EER than both single classifier using scrolling dataset andBioHashing achieved the same result in horizontal dataset.

These results show that ensemble systems with cancelable behavioural biometrics canbe used instead of Original data, without deteriorating the performance of the biometric-basedauthentication systems. The use of an ensemble method is recommended instead of the useof a single classifier in a distorted data domain. This is an important result because thecancelable characteristics offer complexity, revocability to biometric system.

We observe now what ensemble structures perform better among the analysed structu-res. In addition, we compare the ensemble systems results using cancelable methods against thesame structures using Original dataset. To perform such analysis, we report the performanceusing EER values and its standard deviation for each ensemble structure. In addition, we alsocarried out a statistical test, comparing two-by-two ensembles with and without cancelabletransformations (columns 3, 4, 5 and 6 against the Original dataset in column 2, Tables 5


Systems

and 6 ).

Tables 5 and 6 show the EER values and standard deviation of the ensemble systems ineach cancelable transformation, for horizontal and scrolling traits (as described in Section 5.1),respectively. In Tables 5 and 6, the bold numbers represent the cases where the use ofcancelable transformation caused an statistical increase in the accuracy of ensemble systems.In addition, the shaded cells indicate when these improvements are not statistically different(the performance of the Original dataset and the transformed ones are statistically similar).

The notation used in the column Method of Tables 5 and 6 is established using thefirst letters of ensemble method and the number of components in case of Bagging and thename of classifiers algorithm in case of stacking method. Bagging_6_kNN (Bag_6_k) andBagging_6_SVM (Bag_6_S) are the bagging structures with 6 kNN and SVM classifiersrespectively. Bagging_12_kNN (Bag_12_k) and Bagging_12_SVM (Bag_12_S) are thebagging structures with 12 kNN and SVM classifiers. Stacking has 6 classifiers, using k-NNand SVM as component classifiers. kNN (Stacking_kNN_SVM_kNN (Stack_kSk)) andLogistics function (Stacking_kNN_SVM_Logistics (Stack_kSL)) were used as combinationmethod of decision given by stacking components. Finally, the Voting method with 3 k-NNand 3 SVM component classifiers each was evaluated as well.

Tabela 5 – Mean Results using Scrolling Traits

Method Original Interpolation BioHashing BioConvolving Double Sum

Bag_6_k 7.6± 4.8 8.9± 5.4 32.3± 12.6 3.3± 10.7 8.7± 5.5Bag_12_k 7.4± 4.9 8.6± 5.1 32.4± 12.4 3.2± 10.8 8.6± 5.4Bag_6_S 9.2± 6.4 12.4± 8.2 32.4± 19 2.3± 7.8 11.9± 8.1Bag_12_S 9.2± 6.4 12.3± 8 31.2± 15.5 2.1± 7.8 11.7± 8.3

Stack_kSk 7.8± 5.1 10± 6.3 32.3± 13.1 3.4± 10.6 10± 6.5Stack_kSL 7.2± 4.7 9± 5.5 32.7± 12.7 3.4± 10.9 9.1± 5.8

Voting 8.9± 6.4 10.9± 6.7 32.6± 12.6 3.6± 11 11.4± 7.5

Tabela 6 – Mean Results using Horizontal Traits

Method Original Interpolation BioHashing BioConvolving Double Sum

Bag_6_k 8± 4.5 10.6± 6.3 32.8± 9.8 0.1± 0.3 8.7± 4.8Bag_12_k 7.7± 4.3 10.3± 6.2 32.8± 10.2 0.1± 0.3 8.6± 4.6Bag_6_S 11.1± 7.3 16.1± 9 34.4± 17.7 0.4± 0.5 13.3± 8.4Bag_12_S 11.7± 7.8 16.1± 9.2 33.5± 26 0.3± 0.4 13.1± 8.5

Stack_kSk 8.5± 4.9 12.1± 7.3 34± 9.5 0.2± 0.4 10.7± 6.3Stack_kSL 7.7± 4.5 10.8± 6.7 33.1± 10 0.2± 0.4 9.6± 5.4

Voting 9.7± 5.8 13.7± 7.1 33.5± 9.7 0.2± 0.4 11.9± 6.9


All the ensembles generated using biometric templates protected by Interpolation andDouble Sum functions have similar statistical results compared to results achieved by Originaldatasets (Tables 5 and 6). These transformed datasets usually delivered higher EER thanOriginal dataset, but they are not statistically different.

EER values of BioConvolving are statistical better than EER of Original dataset,for all ensemble structures and both stroke directions (Tables 5 and 6). We can concludethat the use of ensemble systems in behavioural cancelable biometrics do not deteriorate theEER results, when comparing with the EER results achieved by Original data. On contrary,ensemble structures using BioHashing data produces the worst EER values compared toresults of the Original dataset, using scrolling and horizontal strokes.

5.4.1 Contributions

Comparing the accuracy of different ensemble structures, it can be seen that theaccuracy of Bagging structures was higher than the other two structures. This is not anexpected result since we believe that the use of heterogeneous structures would lead to anincrease in the diversity level of the ensemble systems and, as a consequence, in the accuracyof these systems.

Our previous work (15) analysed the use of a single classifier (kNN and SVM) usingthe same transformation functions in all Frank’s experiments (Inter Session, Inter Week andIntra Session). We can conclude, using ensemble results, that the use of ensemble structuresimproves the performance using Interpolation, BioConvolving and Double Sum functions inscrolling strokes compared to single classifier performance obtained in (15). Using scrollingstrokes, the BioHashing dataset was the only case where similar performance was obtained.Using the horizontal strokes, we archived better performance (lower EER) than the resultsreported in (15), for all transformed datasets (Interpolation, BioHashing, BioConvolving andDouble Sum).

The Interpolation and Double Sum EER were statistical similar to Original results.The mean EER of Original dataset varies from 7.4% to 11.7%, while in Interpolation dataset,the EER varies between 8.6% and 16.1%. In Double Sum dataset, EER varies from 8.6% to13.3%.

In contrast, BioConvolving provided the best performance, statistically better thanperformance obtained by Original dataset. The mean EER of BioConvolving dataset variesfrom 0.1% and 3.60%. The BioConvolving EER was statistically inferior than all other datasets,for all ensembles structures. The results obtained by BioConvolving are very promising andindicate that the use of cancelable behavioural biometrics has a positive effect in biometric-


Systems

based authentication systems.

5.5 Chapter ConclusionUsing ensemble system, we demonstrated that the use of cancelable functions usually

provides similar or better performance than the Original biometric data, except in BioHashingfunction. In addition, the use of cancelable behavioural biometrics data brings great opportu-nities for research, providing the advantages of behavioural biometrics and the security ofcancelable biometrics. We can see that simple touch movements, even transformed (distorted),can be used as a source of user verification.

The investigation of the use of ensemble systems showed that, in a general perspective,these systems provided lower EER values than single classifiers. Furthermore, a comparisonamong the best ensemble structure on the Original and transformed datasets has been done.This investigation showed that Bagging and Stacking are the ensemble structures with thelowest EER values. In addition, Bagging provides the lowest EER values for the BioConvolvingdataset (the dataset that provided the lowest EER values). This ensemble system achieved anEER value of 0.01, in both stroke direction (scrolling and horizontal datasets).

Still in the context of ensemble systems, we developed a table comparing all theensemble structures to single classifiers. Through these tables it is possible to conclude thatis better to use ensemble systems than single classifiers to verify users using cancelablebehavioural data.

5.5. Chapter Conclusion 85

(a) Scrolling Strokes

(b) Horizontal Strokes

Figura 8 – Inter Session BoxPlots.


Systems



Figura 9 – Inter Week BoxPlots.




Figura 10 – Intra Session BoxPlots.


Systems

(a) Scrolling Ensemble Boxplots with Lower EER

(b) Horizontal Ensemble Boxplots with Lower EER

Figura 11 – Better Ensemble Boxplots for Original and Cancelable Functions

89

6 Multi-Privacy Protection Schemes

6.1 Introduction

Even though a biometric template may contain only some extracted features, for somebiometric traits like fingerprint, it is possible to approximately reconstruct a digital copy ofthe biometric trait (23). Therefore, biometric-based systems need to consider its templatesecurity by providing methods to revoke templates when they are compromised (38). Tothis end, we consider in this chapter a number of strategies to improve biometric templateprotection in this study, namely, (1) using multiple privacy schemes and (2) using multiplematching algorithms. While multiple privacy schemes can improve the security of a biometricsystem by protecting its template, using multiple matching algorithms or similarly, multiplebiometric traits along with their respective matching algorithms, can improve the systemperformance due to reduced intra-class variability (68). The above two strategies lead to anovel, ensemble system that is derived from multiple privacy schemes. We call this ensemblesystem as Multi-Privacy Protection Schemes (MPPS).

In a biometric system enhanced with a privacy scheme, a stolen template cannotbe introduced into another system or be directly associated with the user. Each biometrictemplate protected by a privacy scheme is unique only for use by the intended biometricsystem. An important class of privacy scheme deployed for biometric systems is known as“cancelable biometrics” which transforms or intentionally distorts the Original biometricsamples to protect the user’s privacy.

Although a number of methods exist to combine information sources, as describedin (36), few studies systematically explore the effectiveness of multiple cancelable functionsor protection schemes in the context of information fusion. The literature on informationfusion considers the following classes of systems: (i) multimodal system, which combinesmultiple biometric traits collected by different sensors; (ii) multi-algorithmic system, whichuses different features of the same biometric modality; (iii) multi-sample system, whichcombines several samples or instances of the same biometric modality; (iv) multi-sensorsystem, which recognises a user based on one biometric modality through the combination ofdifferent sensors (of the same modality). A special case of multi-algorithmic biometric systemis to use several biometric feature representations, leading to a multi-feature biometric system.Although some progress on biometric information fusion includes the combination of auxiliary,non-discriminatory information such as user-specific characteristics and biometric sample

90 Capítulo 6. Multi-Privacy Protection Schemes

quality (67), as well as combining multiple biometrics in the context of template protection (74),the property of using multiple protection schemes is not often not systematically evaluated,at least not in the context of information fusion.

In addition, nearly all the studies on multibiometric template protection focus onphysiological biometrics such as face, fingerprint, and iris (74). The level of protection forbehavioural biometrics, where the sample has much more variability, it is arguably less uniqueto an individual, and the possibility of change in behaviour over time, merits an investigationon its own right.

The closest work on multiple protection schemes applied to the behaviour biometricsis (16), which suggests that an ensemble system that is generated by one cancelable functionhas similar (Interpolation and Double Sum functions) statistical performance compared withperformance achieved by the baseline ensemble system without any cancelable functions. Inaddition, the authors showed that the use of BioConvolving to the ensemble method cansignificantly further improve the statistical performance of the ensemble compared with thebaseline system.

Apart from adding cancelable functions, we also systematically evaluate the perfor-mance of the ensemble using all possible combinations of the constituent cancelable functions.Moreover, we compare each of the possible combinations of ensemble with their respectivebaseline systems (without the cancelable function) as well as with a single privacy protectionscheme.

In summary, the contributions of this chapter are four-fold: (1) proposal of a Multi-Privacy Protection Scheme applied to biometrics; (2) performance evaluation of multi-biometricsystem using sub-sets of protected biometric traits; (3) understanding the system performancein the context of decision fusion, based on the ensemble method; and (4) applying the systemto behavioural biometrics.

6.2 Methodology

The methodology performed in this chapter is similar to the methodology performedin previous chapter. The data transformation and data processing step is the same performedin the Section 5.2. The difference in this chapter is the system architecture.

The system architecture is composed of three components: (1) a pool of cancelablebiometric schemes, (2) an ensemble system, and (3) a decision fusion, based on the majorityvote. If in the pool there are N cancelable schemes, then, there are at most

N∑k=2

CNk possible

ways to construct the ensembles. In our case, there are 11 possible ensemble systems derived

6.2. Methodology 91

from four cancelable schemes (N = 4). Figure 12 shows the architecture of the experimentalsystem along with the decision module based on the voting scheme.

We consider three experimental scenarios, each one with a different number of cance-lable schemes, namely:

1. the scenario with two cancelable schemes: C42 = 4× 3/2 = 6 combinations;

2. three scheme scenario: C43 = 4 combinations, and finally;

3. the scenario in which all the four schemes scenarios are used: C44 = 1 combination.

Each one of the 11 combinations is referred by the conjunction of the first four lettersof each constituent cancelable scheme, namely, Inte for Interpolation, BioH for BioHashing,BioC for BioConvolving, and Doub for DoubleSum. For example, the scenario ‘InteDoub’refers to the ensemble consisting of the Interpolation and the Double Sum schemes.

Figura 12 – Flowchart represents the system architecture

The ensemble system is composed of k-NN, SVM, Naive Bayes and MultiLayerPerceptron classifiers. These classifiers were selected because they provide several learningapproaches and error bias. These characteristics increase the ensemble diversity, which usuallyleads to an improvement in performance.

The classification algorithms of this investigation were extracted from WEKA1 package.In addition, an exhaustive investigation using a cross validation approach defined the valuesof supervised learning algorithms parameters. Consequently, the algorithms were executedwith the following parameters: k-NN with k = 5; SVM with polynomial kernel (grid-search),pruned decision tree, MLP (one hidden layer) and Naive-Bayes with standard WEKA settings.

Our empirical study evaluates the biometric system performance using a modified10-fold cross-validation method. Our modified 10-fold cross-validation ensures that the training1 http:www.cs.waikato.ac.nz/ml/WEKA


set does not have any biometric sample of subjects present in the test set. Therefore, wetest our system with subjects samples that have not been processed in training phase. Allresults presented in this chapter refer to the mean over all balanced user-specific datasetsusing 10-fold cross-validation technique.

The biometric system performance is evaluated using boxplot. The boxplots plotsthe EER achieved among 41 users for each combination of cancelable function. In addition,we present and discuss the relative change of EER values by combination experiment. Therelative change metric shows how the new variable influences the performance. The relativechange of the experimental combination scenarios is obtained by:

Relative Change = EERCombinedScenario − EEROriginal

EEROriginal

× 100% (6.1)

The next section will present and discuss the relative change of EER values by combinationexperiment.

6.3 Results and DiscussionThis section presents two different analysis:

1. Comparison of Multi-Privacy Protection Scheme vs. single privacy protection schemeand;

2. The relative change of EER(%) of these multi-privacy scenarios compared with baselinedataset (Original dataset). Whereas the Original dataset is a set of non-modifiedbiometric samples.

The Voting results available in (16) are used to compare the results of multi-privacyagainst the single privacy protection scheme. The Table 7 presents the mean EER and standarddeviation of voting decision fusion using only one protected biometric sample (16).

Figures 13 and 14 exhibit the boxplots (EER) by Scenario achieved by MPPS experi-ment, using horizontal and scrolling strokes respectively. For better comparison and readability,the boxplots represent the average EER values and they are organized in an ascending order.

The purpose of the analysis of multi-privacy scheme vs. single-privacy is to check if themulti-privacy schemes can outperform the second and third best mean EER (Double Sum andInterpolation) achieved by classifiers using only one privacy protection method(Table 7). Thebetter case (BioConvolving) is not compared because its result in single protected scenariois very good and hard to improve, 0.2% and 3.6% using horizontal and scrolling strokes,respectively.

6.3. Results and Discussion 93

Tabela 7 – Voting - EER - Percentage. Source: (16)

Horizontal ScroolingScenario Mean Stand. Deviation Mean Stand. DeviationOriginal 9.7 5.8 8.9 6.4Interpolation 13.7 5.8 10.9 6.7BioHashing 33.5 9.7 32.6 12.6BioConvolving 0.2 0.4 3.6 11DoubleSum 11.9 6.9 11.4 7.5

0.0 0.1 0.2 0.3 0.4

EER by Scenario − Horizontal

EER

BioCBioH

DoubBioC

DoubBioCBioH

DoubBioH

InteBioC

InteBioCBioH

InteBioH

InteDoub

InteDoubBioC

InteDoubBioCBioH

InteDoubBioH

Original

Figura 13 – BoxPlot of MPPS scenarios using horizontal strokes

Using horizontal strokes, multi-privacy methods increase the performance of Interpola-tion and Double Sum compared to single-privacy scenarios in five out of seven cases (71.42%).Four cases are better (57.14%) using scrolling strokes.

The performance of Multi-Privacy Protection Scheme using BioHashing dataset to-gether with at least one different protected sample increases in 100% of cases, when comparedwith single privacy scheme using BioHashing samples. Compare Table 7 and BioHashing


0.0 0.1 0.2 0.3 0.4 0.5

EER by Scenario − Scrolling

EER

BioCBioH

DoubBioC

DoubBioCBioH

DoubBioH

InteBioC

InteBioCBioH

InteBioH

InteDoub

InteDoubBioC

InteDoubBioCBioH

InteDoubBioH

Original

Figura 14 – BoxPlot of MPPS scenarios using scrolling strokes

boxplots present in Figures 13 and 14. This is an important finding because the use of acancelable biometric trait with poor performance (BioHashing) can increase the performanceof a biometric system, when it is combined with a different protected sample.

In both scenarios (horizontal and scrolling strokes), the multi-privacy scenarios withBioConvolving achieve lower mean EER than mean of single-privacy scenarios, i.e, the useof BioConvolving samples increases the performance of multi-privacy scheme comparedwith single-privacy schemes. We can observe that all multi-privacy scenario boxplots usingBioConvolving have mean values (Figures 13 and 14) under all mean values of single scenarios(Table 7) (except BioConvolving).

When comparing both directions, as in (17), the results using horizontal strokes werebetter than scrolling strokes. Using as metric the accumulated EER, we have an accumulatedEER of 90.9 and 140.6 for horizontal and scrolling datasets respectively. The difference ofaccumulated EER between horizontal and scrolling strokes is 49.7 units.

The EER of the scenarios using all protected biometric samples is among TOP 5 results(horizontal = 5th, scrooling=3rd), both lower than Original dataset. This result shows that


the use of all protected samples increase the performance of the biometric system comparedto its use with Original data. In addition, it minimizes all the disadvantages related to theuse of a single non-protected scheme.

The second phase of investigation, we will analyse the different multi-privacy scenariosconsidering the relative change of EER(%). The relative change compares the multi-privacyscenarios against Original samples (non-modified biometric sample). The Figures 15 and 16exhibit the boxplots ploted with relative change of EER(%) using horizontal and scrollingstrokes, respectively. Similarly to Figures 13 and 14, the Figures 15 and 16 are organized in amean ascending order.

−10

0

−50 0 50 10

0

150

Relative Change EER(%) by Scenario − Horizontal

relative change of EER(%)

BioCBioH

DoubBioC

DoubBioCBioH

DoubBioH

InteBioC

InteBioCBioH

InteBioH

InteDoub

InteDoubBioC

InteDoubBioCBioH

InteDoubBioH

Original

Figura 15 – BoxPlot of relative change of EER(%) using horizontal scenarios

Seven scenarios (63.63%), out of 11, are better than Original dataset using horizontalstrokes (Figure 15). Using scrolling strokes, 6 scenarios schemes, out of 11 (66%), are betterthan Original dataset scenario (Figure 16).

The use of another privacy method together with BioHashing decreases significantlythe EER of BioHashing using horizontal strokes, 28.6% in the best case (BioHashing EER− BioCBioH EER). In addition, the EER decreased 24.9% in the best case (InteBioCBioH)


−10

0 0

100

200

300

400

Relative Change EER(%) by Scenario − Scrolling

relative change of EER(%)

BioCBioH

DoubBioC

DoubBioCBioH

DoubBioH

InteBioC

InteBioCBioH

InteBioH

InteDoub

InteDoubBioC

InteDoubBioCBioH

InteDoubBioH

Original

Figura 16 – BoxPlot of relative change of EER(%) using scrolling scenarios

using BioHashing with another protected sample (scrolling strokes).

These findings show that Multi-Privacy Protection Scheme using voting decisionfunction do not deteriorate the performance of the biometric-based authentication system,compared with the use of non-modified biometric sample. In some cases, using only twoprotection functions. The multi-privacy protected schemes perform better in at least 54,4% ofcases, when compared with Original dataset.

6.4 Chapter Conclusion

In this chapter, we propose and evaluate a Multi-Privacy Protection Scheme (MPPS)using ensemble systems. The multi-privacy scheme combines multi-algorithms and multipleprotected biometric samples. The MPPS combines the performance provided by ensemblesystems and the security of cancelable biometric functions. The Multi-Privacy ProtectionScheme was applied to analyse the performance and security of a biometric system usingensemble systems and cancelable biometric samples.


Multi-Privacy Protection Scheme achieved better performance compared with Originaldataset. Using horizontal strokes we obtained 7 and 6 (of 11 scenarios) better scenarios usinghorizontal and scrolling strokes respectively.

Multi-Privacy Protection Scheme has better performance than single protection scheme.A combination of protected biometric sample in a multi-privacy scheme increases the per-formance compared with the single scheme using the same protected biometric sample. Inour case, the use of another privacy method together BioHashing decreases significantly theEER of Biohashing, 28.6% in the best case (BioCBioH). Using both stroke directions, theperformance (EER) of multi-privacy scheme using BioHashing droped by 28.6% (BioCBioH -Horizontal) and 24.9% (InteBioCBioH - Scrolling) in the best cases.

Multi-Privacy Protection Scheme increase the performance and security of biometricsystems. Using the relative change of EER (%) of multi-protection scheme, the scenariousing all the cancelable functions is in 5th and 6th place using horizontal and scrolling data,respectively. This result shows that multi-protection scheme increases the performance ofa biometric system compared with the use of non-protected biometric samples (Originaldataset). Beyond this, the use of multiple protected biometric samples provides more securitybecause the success rate of attacks such as shoulder surfing, forgeries or brute force decreases.

Multi-Privacy Protection Scheme encourages the use of safe biometric authenticationsystems. From our results, we see that use of only two different protected data presentslow EER values for both stroke directions. Based on our work, we propose the combination:Double Sum+BioConvolving (DoubBioC), Interpolation+BioConvolving (InteBioC) andInterpolation+Double Sum+BioConvolving (InteDoubBioC) for any stroke direction.

Multi-Privacy Protection Scheme (MPPS) increases user inconvenience. In MPPS,an individual needs to present more than one key to encode his biometric sample duringuser verification. In addition, the processing time of multi-privacy scheme is higher than useof only non-protected biometric sample. However, the increase of performance and securitycan overcome this limitation. There is no guarantee here that MPPS, which is a specializedensemble method, can be better than any protected biometric sample. In our case, anymulti-privacy scenarios were not better than BioConvolving (Table 7).

99

7 Impact of Key Storage in Cancelable Functi-ons

7.1 Introduction

A common solution to protect a biometric template is to transform it through acancelable function (72). Due to its one-way transformation property, unless proven otherwise,an attacker may find it hard to obtain the original biometric template after the use of thecancelable function. However, cancelable functions often inadvertently introduce noise artefactsthat could diminish the discrimination power to recognise the enrolled user (38). Furthermore,cancelable functions require an additional key to encode the biometric sample (73). Therefore,methods to deal with noise and understanding the role of the key in cancelable functionsmust be addressed.

We evaluate empirically four commonly used cancelable functions to understand theimpact of the key in cancelable functions, namely Interpolation, BioHashing, BioConvolvingand DoubleSum. In particular, we address the practical issue of 1) key storage, i.e, token orsensor; 2) how the key is generated, i.e., from a token, a password, or a combination of thetwo; 3) the effect of key length; and 4) whether a common key can be applied system-wide oreach authorised user should carry his/her own key, i.e., Homogeneous versus HeterogeneousKey scenarios.

This study is motivated by the following three observations.

1. Lack of a systematic study or discussion on the role of the key and the attacker’sknowledge of the key in an attack. A cancelable function protects a biometric templateby transforming it with the aide of a user-specific key. Although there are many studiesproposing new cancelable functions (39, 47, 66), few studies look at the practical issue of howthe key should be generated, stored, or shared across devices within the same application.Very often, the need of store the key by carrying a token diminishes one important aspect ofusability of a biometric system, i.e., the property of biometrics which offers the possibilityof not carrying any object (such as a key) to identify itself. Therefore, the key storage issuemust always be discussed in light of any cancelable function.

Furthermore, although cancelable functions have been shown to improve the biometricrecognition performance in terms of reduced false acceptance rate, these claims often implicitlyassume that the attacker has not made any effort to steal the key. In fact, the key issue is

100 Capítulo 7. Impact of Key Storage in Cancelable Functions

often not discussed until lately. Despite the claims of novel cancelable functions, once the keyis compromised, e.g., stolen by the attacker who then applies the same cancelable key butusing another biometric sample. The biometric performance due to the cancelable functionturns out to be worse than the baseline system using the raw biometric template. Is thischaracteristic universal for all cancelable functions? What is the degree of robustness of acancelable function when its key is compromised? Is there a candidate cancelable functionthat is more robust than others under this known-key attack? To our best knowledge, there isno literature that compares cancelable function in this way.

2. Lack of a discussion of the evaluation metrics in light of classifier-based bi-ometric systems A work written by Nagar et al. (55) defines some metrics in order toevaluate the performance of a biometric system under various attack scenarios in which thekey is compromised, or when the cancelable function fails to satisfy its one-way transformproperty. This leads to novel way of characterising biometric system performance in termsof “false cross match rate” and its variants (which we will discuss later on). However, thisdefinition of error rate is only applicable to any simple biometric comparison function thatcompares two biometric samples – one taken as a template and another as a query sample.

Increasingly, the more complex biometric systems, especially those dealing with beha-vioural biometrics, e.g., in speaker verification (77), hand signature (78) and Touchalytics (25)(as used in this work), do not exactly uses a template as a reference. On contrary, the systembuilds a statistical model or a classifier. Effectively, the output of the biometric enrolmentprocedure is a set of model parameter which are then stored as a reference. As a result, thefalse cross match rate is no longer applicable. Indeed, there is scant literature suggesting howa classifier-based biometric should be evaluated when the key is compromised.

We have, therefore, set ourselves four objectives in this chapter, namely,

• to systematically evaluate and compare four cancelable functions when the key iscompromised;

• to evaluate and compare two strategies to store the key of a cancelable function, namelyHomogeneous and Heterogeneous Key storage (to be further described);

• to establish an evaluation methodology for classifier-based biometric systems

3. Homogeneous versus Heterogeneous Key storage. In the second objective above,we have defined the Homogeneous Key storage strategy and its heterogeneous counterpart.In the former, a common key is used and shared by all the users of the biometric device. Insome applications, this key is stored inside the biometric device, through a hardware or a

7.1. Introduction 101

software solution. For example, Apple define a secure enclave where the encrypted templateand the encryption key is stored inside the processor (1). The advantage of this hardwaresolution is that the key is hidden from the operating system; hence remains inaccessible byany software application installed therein after.

In the Heterogeneous Key storage strategy, every user has to carry her own key, eventhough they may be sharing the same device. This is the default scenario. If the user doesnot carry her key, say, stored in a smart card, the key has to be transmitted to the biometricdevice via a trusted communication channel somehow. It is obvious that the key has to bereleased; and in some cases, needs to be protected. In the worst-case scenario, Nagar et al.(56) show that the template can be approximately constructed when a cancelable functionsuch as BioHashing has its key compromised. In other words, having the user to carry thekey, which is the Heterogeneous Key storage strategy bears a potentially significant risk whenthe attacker has the required knowledge and skill. For this reason, it is reasonable to examinethe Homogeneous Key storage strategy, and the merit of this approach.

7.1.1 Contributions and Contrast with the Literature

In summary, although much work has been done to develop novel cancelable functions (7,39, 47, 66), with applications to different biometric modalities (15, 62, 65, 83), the propertiesof cancelable functions when used with biometrics have not been systematically investigated.In order to undertake this study, we have systematically evaluated the characteristics offour cancelable functions using a behavioural biometric, namely Touchalytics (25). Our maincontributions can, therefore, be summarised as follow:

1. Examine the behaviour of the four privacy-preserving schemes (or cancelable functions)under stolen (known) key attacks as well as unknown-key attacks;

2. Propose a novel key storage strategy, that is Homogeneous Key storage as an alternativeto the widely used Heterogeneous Key storage strategy

7.1.2 Our findings

Our experiments based on the touch-screen based biometrics provide a number ofuseful insights to our conjectures. First, we found that the Homogeneous Key storage has thesame level of security than the Heterogeneous Key storage. This finding is significant becausethe storage of Heterogeneous Key has some disadvantages, such as the user key has to bestored in a different device or remembered by the user, which can be inconvenient. Moreover,the key can be shared, lost, or stolen easily. We found both key storage scenario has the


security level just slightly worse than the biometric system which works with unprotectedtemplates.

Second, we found that whether or not the attacker has knowledge about the user key.The user key has a significant impact on the system performance (46). We discuss two scenariosof attacks for each one of the analysed cancelable functions. Under the known-key attackscenario, the attacker who possesses the genuine user’s key attempts to apply the cancelablefunction using someone else’s biometric template in order to impersonate the user. On the otherhand, under the unknown-key attack scenario, the attacker does not have in his possessionthe genuine user’s key. We observed that the recognition performance decreases under theknown-key attack scenario compared to the unknown-key scenario; and this behaviour issystematic across the four cancelable functions we have studied. When compared to thebaseline biometric system (using unprotected templates), we found that known-key attackshave similar performance. Therefore, the user key has an impact in the security performancebut in the worst case, i.e., under the known-key attack scenario, the system performance issomewhat comparable to the baseline system (with unprotected templates).

In summary, our empirical analysis provides a better understanding of the key impact(key generation and key storage) in salting and non-invertible functions. The experimentsshows that the key knowledge increase the success rate of the impostor compared to thecase when the impostor does not know the user key. We observed that the key storage isnot a crucial issue because the Homogeneous and Heterogeneous Key scenarios have similarperformance under both key knowledge attacks.

7.2 Notation and Methodology

This section presents the notation and the evaluation of the metrics used in theevaluation of our experiments. The notation and the evaluation metrics were inspired by (56).The main difference in the notation that it will be defined and (56) is the use of classifiersinstead of distances to evaluate the recognition performance.

7.2.1 Original Domain

The enrolled template and query template of user j is represented by xj and x′j

respectively, for xj, x′j ∈ Xj and j ∈ {1, . . . , J} ≡ J users.

The x ∈ Xj denotes a particular observation and the notation {x} denotes a set. Inaddition, {x | criteria} expresses that the feature x is conditioned upon some criteria. LetX−j ≡ {X1,X2, . . . ,XJ } − Xj be the set of all the biometric samples without the features of

7.2. Notation and Methodology 103

the user j.

The training step in original domain is defined by:

Train: ({X trainj } × {X train

−j })→ classifier(θj) (7.1)

where X trainj and X train

−j are the training set of positive and negative samples respectively.

Once the classifier has been trained, we obtain the classifier parameters θj . The classifieris tested with positive samples, thus producing the match or positive score set in the originaldomain. Using the target user j as a reference, the positive score set can formally be writtenas:

Y+O

j ≡ {classify(xj, θj) | xj ∈ X testj } (7.2)

Similarly, the negative score set is generated using nonmatch features from other users.

Y−O

j ≡ {classify(xi, θj) | xi ∈ X test−j } (7.3)

The system performance is evaluated in the Original domain using the set of positiveand negative scores. Thus a DET/ROC curve is plotted using the pair of scores {Y+O

,Y−O}.

Y+O ≡J⋃j=1Y+O

j (7.4)

Y−O ≡J⋃j=1Y−O

j (7.5)

The usability of the system in the original domain is measured using the False RejectionRate (FRRO(ε)) and False Acceptance Rate (FARO(ε)). The False Rejection Error givesthe rate of denied access to genuine users. The False Rejection Error in the Original domainFRRO

j (ε) for a specific user j is given by

FRROj (ε) =

| {y+j ∈ Y+O

j | y+j < ε} |

| Y+O

j |(7.6)

where | • | counts the number of occurrences of •.

Intrusion threats are attacks when the impostor presents his own biometric samplein order to be authenticate. The False Acceptance Error measures the probabilities of theseattacks based on the domain and the knowledge of the user key. The False Acceptance ErrorFARO

j (ε) in original domain is

FAROj (ε) =

| {y−j ∈ Y−O

j | y−j > ε} || Y−O

j |(7.7)


The EER is a standard metric used to evaluate the performance of biometric systems.The EER is a point when the False Acceptance Error (FAR) and False Rejection Error (FRR)is the same. In other words, EER measures the frequency of fraudulent accesses (FAR) andthe frequency of rejection genuine people (FRR) assume the same value.

7.2.2 Transformed Domain

Let f be the transformation function, kj be a set of transformation parameters (userkey) for user j and k′j a different key for the same user j. k′j 6= kj is another legitimate keyassigned in another application to the user j.

Let f(xj, kj) be a positive cancelable sample and f(xi, ki) | i 6= j be a correspondingnegative sample. Let θ be the classification parameters after the training step. Similarly withthe training step performed in original domain, the training step in transformed domain isdefined by:

Train: {f(X trainj , kj)} × {f(X train

−j , ki)} → classifier(θj) (7.8)

where ki 6= kj, X trainj and X train

−j are the training set for positive and negative samplesrespectively.

In transformed domain, the classifier is tested with protected positive samples, thusproducing the match or positive score set. Using the target user j as a reference, the positivescore set in transformed domain can be written as:

Y+T

j ≡ {classify(f(xj, kj), θj) | xj ∈ X testj } (7.9)

Similarly, the negative score set in transformed domain is generated using nonmatch featuresfrom other users with a known-key kj or an unknown-key k′. In both cases, these two setscan be defined by

Y−kk

j ≡ {classify(f(xi, kj), θj) | xi ∈ X test−j } (7.10)

Y−uk

j ≡{classify(f(xi, k′), θj) | xi ∈ X test−j , k

′ 6= kj}, (7.11)

respectively. Here, the “kk” superscript in Y−kk

j leads for the set of impostor matching scorefor attacks with a known-key and the superscript “uk” in Y−uk

j represents the set of impostorattacks with an unknown-key.

7.2. Notation and Methodology 105

After, we evaluate the performance using the pair of score set Y+j and Y−j for all the

users, i.e.,

Y+T ≡J⋃j=1Y+j (7.12)

Y−kk ≡J⋃j=1Y−kk

j (7.13)

Y−uk ≡J⋃j=1Y−uk

j (7.14)

The score sets {Y+, Y−kk} and {Y+,Y−uk} (now without the subscript j) are used toplot the ROC/DET curve for known and unknown-key attacks respectively.

Similarly in original domain, the usability and attacks are measured using the FalseRejection Rate (FRRT ) and False Acceptance Error (FART ) in transformed domain. TheFalse Rejection Error for a specific user j in the Transformed domain FRRT

j (ε) is

FRRTj (ε) =

| {y+j ∈ Y+T

j | y+j ≤ ε} |

| Y+T

j |(7.15)

In the transformed domain the False Acceptance is divided by the knowledge of keyby impostor. Thus, the False Accept Rate using an unknown-key k′ for a specific user j isgiven by

FARTUK

j (ε) =| {y−uk

j ∈ Y−uk

j | y−uk

j > ε} || Y−uk

j |(7.16)

The False Accept Rate for a specific user using a know key kj is given by

FARTKK

j (ε) =| {y−kk

j ∈ Y−kk

j | y−kk

j > ε} || Y−kk

j |(7.17)

Further usability and intrusion threats, Nagar et al. (56) propose some metrics toevaluate the linkage threats vulnerabilities. Linkage threats are vulnerabilities generated bycommon user information present in different systems. In our case, the impostor can knowthe different user keys kj and k′j used in distinct two biometric systems.

The Cross Match Rate in the transformed domain (CMRT ) and in the original domain(CMRO) defines how often is possible to link two different templates generated by thesame biometric trait xj using different keys k1

j and k2j in transformed and original domain


respectively. The Two Cross Match attack scenarios need to be defined in order to formulatethe Cross Match Rate equation. We use two scenarios to simplify the illustration but the ideacan be extended for n scenarios. The two cross match attack scenarios using two applicationare illustrated in the Figure 17.

Let y1j = classify(f(xj, k2

j ), θ1j ), y1

j ∈ Y1j be the classification score of template encoded

with the key k2j of application 2 using classifier parameters θ1

j . y2j = classify(f(xj, k1

j ), θ2j ), y2

j ∈Y2j be the classification score of protected template f(xj, k1

j ) encoded with k1j using the

application parameters θ2j and Yacm =

J⋃j=1Yaj , a ∈ {1, 2, . . . , n}.

Keysand

Parameters

Cross MatchScenario

App 1 App 2

a) b)

Figura 17 – Cross Match Rate Attack scenarios Illustration. a) The attacker inserts a protectedtemplate f(xj, k2

j ) in order to access application 1. b) The attacker inserts aprotected template f(xj, k1

j ) in order to access application 2.

Thus,

CMRTa (ε) ≡ | {y ∈ Y

acm | y > ε} || Yacm |

(7.18)

In addition, let f−1β be the invertible function with a fraction β of unprotected template

xj. Let y−1j = classify(f−1

β (f(xj, k2j ), k2

j ), θ1j ), y−1

j ∈ Y−1j be the classification score of the

unprotected template xj decoded using f−1β (f(xj, k2

j ), k2j ) with the key k2

j of application 2using classifier parameters θ1

j and y−2j = classify(f−2

β (f(xj, k1j ), k1

j ), θ2j ), y−2

j ∈ Y−2j be the

classification score of unprotected template f−2β (f(xj, k1

j ), k1j ) decoded with k1

j using the

application parameters θ2j and Y−acm =

J⋃j=1Y−aj , a ∈ {1, 2, . . . , n}. The attack scenarios with

two scenarios is illustrated in the Figure 17.

CMROa (ε) ≡ | {y

−1 ∈ Y−acm | y−1 > ε} || Y−acm |

(7.19)

7.3. Homogeneous and Heterogeneous Key 107

The Cross Match Rate in transformed and original domain is beyond the scope ofthis thesis. The contribution of this thesis is define cross match rate using classifiers in amathematical notation. In the next section we will focus in template protection. Specifically,we will discuss the use of the key in the template protection methods.

7.3 Homogeneous and Heterogeneous Key

Feature transformation methods require a user key to protect biometric samples.Therefore, the user key has an important role in these methods because it is user-dependentand it has strong impact in the template protection (49). One of our hypotheses is that thegenerated template using a provided user key is biased on the key instead of biometric features.Consequently, the biometric system will be specialized to recognize the user key instead ofbiometric features.

Commonly each user uses its own key to protect his data (46, 49). However, any studiesevaluate the idea to use a single key for all users. Thus, this thesis evaluates the performanceof cancelable functions using a single key to all the users. Therefore, the use of a single keyfor a cancelable function increases system usability and supports the use of Multi-PrivacyProtection Schemes (see Chapter 6).

Security analysis of protected template transformation is a biometric field not so wellanalysed. Most of the works only evaluate the biometric system performance because theauthors assume that the template protection is free of decoding and linkage attacks. However,these vulnerabilities of a template transformation scheme to intrusion and linkage attacks areanalysed in (56). This thesis defined some metrics related to usability and intrusion threatsto evaluate the system vulnerabilities.

Although much work has been developed to propose cancelable functions (7, 39, 47, 66)and its applications (15, 62, 65, 83), there is still significant room for improvement of cancelablefunctions. The main contributions of this chapter are:

1. Examine the behaviour of the four privacy-preserving schemes under stolen key attacksas well as Unknown Key Attacks;

2. Consider two key-storage scenarios:

a) Homogeneous Key storage where all users of the device share the same key;

b) Heterogeneous Key storage where every user has his/her own key.


7.4 Experimental Scenarios

This section presents four different scenarios based on the use of the same (homo-geneous) or different (heterogeneous) key under key knowledge attack. The Heterogeneousscenarios model the standard evaluation performed in the literature, which is, each user hashis own key. Whereas, the Homogeneous Key scenarios model our proposal, the use of a singlekey for all users.

Heterogeneous and Homogeneous scenarios is used to design the Key Knowledge attack.For each key storage scenario, the biometric system is evaluated in the cases when the attackerknows (Known Key) and does not known the genuine user key (Unknown Key). Thus, we fillthis gap in the literature which generally evaluates attacks when the impostor does not knowthe key. However, this assumption can be expanded to cases when the impostor has access tothe genuine key.

The authors of new cancelable methods need to evaluate an impostor attack when thekey is known. A Known Key Attack can be executed because a user can provide the key toanother individual or a hacker can retrieve all the stored user keys. After an intense research,the studies which evaluates this kind of attack are (46, 56). The cited studies show that therecognition performance of cancelable methods under Known Key Attack is affected.

Known Key scenarios does not affect the user privacy. Our scenarios use non-invertiblefunctions, i.e, even in possession of the key, an impostor is not able to decode the userbiometric features in a feasible time. Thus, the four designed scenarios do not suffer privacyproblems due to non-invertible property.

7.4.1 Scenario 1 - Heterogeneous Key - Unknown Key

This scenario evaluates the recognition performance of a biometric system which eachuser j has a different key kj, j ∈ {1, . . . ,J } and an impostor does not know the user key kj.Thus, formally the Heterogeneous Key scenario under unknown-key attack can be written asclassify(f(xi, k′j), θj) | xi ∈ X test

−j , let k′j is a guessed key for user j.

7.4.2 Scenario 2 - Heterogeneous Key - Known Key

In contrast, Scenario 2 models the scenario when all the J users use a different keykj, j ∈ {1, . . . ,J } but an impostor knows the user specific key kj. This known-key attack in-creases the penetration rate because the score classify(f(xj, kj), θj) ≈ classify(f(xi, kj), θj) |xi ∈ X test

−j (56).

7.5. Methodology 109

Scenarios 1 and 2 model the common sense of the biometric template protectioncommunity, i.e, each user has your own key kj and this key must be secret, even when thebiometric systems use non-invertible functions. The key kj must be secret because in possessionof this key the score of the classifier classify(f(xi, kj), θj) > classify(f(xi, k′j), θj) | xi ∈ X test

−j .Therefore, the user key impacts the recognition performance of the biometric systems.

7.4.3 Scenario 3 - Homogeneous Key - Unknown Key

The Scenario 3 is modelled similarly to Scenario 1 (see subsection 7.4.1). In Scenario3, the biometric system use a unique key ks for all users and an impostor does not know ks.Thus, an impostor attack classify(f(xi, k′s), θj) aims to have access to the system as user jusing a guessed key k′s and an impostor sample xi ∈ X test

−j .

7.4.4 Scenario 4 - Homogeneous Key - Known Key

In contrast, Scenario 4 uses a system key ks for all users but the impostor knows ks.This scenario is the worst case scenario because the impostor knows a common information(ks) of all users. Thus, an impostor xi ∈ X test

−j can attack classify(f(xi, ks), θj) a user j.Our hypothesis is that Homogeneous Key scenarios (scenarios 3 and 4) have similar securityperformance of Heterogeneous Key scenarios (scenarios 1 and 2).

Tabela 8 – Mathematical Formulation of Key Knowledge Attack in Homogeneous and Hete-rogeneous scenarios

Key Knowledge AttackUnknown Key Attack Known Key Attack

Heterogeneous Key classify(f(xi, k′j, θj) classify(f(xi, kj, θj)Homogeneous Key classify(f(xi, k′s, θj) classify(f(xi, ks, θj)

Next section will present and discuss the necessary effort to cross match a biometricsystem using the Homogeneous and Heterogeneous approach.

7.5 MethodologyThe data processing step follows the same methodology defined in 5. In other words,

the touch screen dataset is divided by user. After, each user-specific dataset is balanced.

In order to evaluate each experimental scenario, it is generated a training and test setwhich follow each scenario requirements. Thus, a test set with positive and negative samples


not used in the training phase was separated after data processing step. The positive andnegative samples are encoded following the different four scenarios requirements.

Consequently, for Scenario 1, the training protected templates generated are composedby the user data encoded with a different key for each user. In addition, the test datais generated using different keys. This methodology is in accordance with the scenarioHeterogeneous Key under Unknown Key Attack. In other words, the negative samples areencoded with a different key, because an impostor does not know the key used by the genuineuser.

We use a similar methodology in Scenario 2 (Heterogeneous Key under Known KeyAttack). The difference is that the impostor samples in the test dataset are generated usingthe genuine user key. Therefore, all the impostor data present in test set were encoded usingthe genuine user key.

The Homogeneous Key under Unknown Key Attack - Scenario 3 is analogous with theScenario 1. The difference is the all the users are encoded using the same key. However, thetest set generation is different. The impostor data present in the test set were encoded usinga different key due to characteristics of an Unknown Key Attack scenario.

The last scenario (Scenario 4 - Homogeneous Key under Known Key Attack), thetraining and test sets are generated using the same key. As described before this is the worstcase scenario. In the Scenario 4, all the users samples (genuine and impostor) present thetraining and test data are encoded using the same key.

The second phase is to train the user-specific classifier. Each target user has its owndedicated classifier in the methodology design. The training phase consists to present to theclassifier positive stroke samples, i.e., samples belonging to the target user (or client/matchsamples) and negative stroke samples.

We evaluate the performance of the system with logistic regression, kNN and SVMclassifiers in order to define what classifier is more appropriate to the problem. This preliminaryanalysis showed that the kNN provided the slight better result than SVM. Unfortunately, thekNN scores difficult the evaluation of security performance (FAR/FRR/DET Curves) becausethe produced scores are discrete. Consequently, we choose SVM classifier because it providesbetter results than logistic regression and provides continuum score values.

We used Matlab1 suite in this investigation. In addition, an exhaustive investigationusing a cross validation approach defined the values of SVM parameters using Radial BasisFunction Kernel.

1 http://uk.mathworks.com/

7.6. Analysis of Cross Match Attack Effort 111

Our empirical study evaluates the performance of the scenarios using a 3-fold validationmethod. Our 3-fold validation ensures that the training set does not have any biometricsample of individuals who are present in the validation/test set. Therefore, we test our systemwith biometric samples that have not been processed in training/validation phase. All theresults presented in this thesis refer to the mean over all balanced user-specific datasets using3-fold validation technique.

The last module visualization plots the DET curves used to analyse the performanceof the biometric system under the different scenarios. The DET curves measures the FARvs FRR values among different thresholds. Thus, it is possible to have an overall idea of thesystem performance. In addition, EER value can be found in DET curve, it is just necessaryto find the point where FAR=FRR.

The Sections 7.6, 7.7 and 7.8 evaluate recognition performance of a biometric systemprotected with cancelable methods considering, respectively, cross match attack effort, thekey knowledge attack and the use of Homogeneous or Heterogeneous key.

7.6 Analysis of Cross Match Attack Effort

This section describes the necessary effort to perform a Cross Match Attack. CrossMatch Attack is an attack where impostors try to authenticate themselves in another biometricsystem using a stolen biometric template and the genuine user keys used in both biometricsystems. This attack takes advantage of biometric and key possession in order to try to invadea different biometric system where a vulnerable user is enrolled. The objective of this sectionis to provide an understanding of the required effort to execute a cross-match attack in aHomogeneous Key or Heterogeneous key.

The Cross Match Attack Effort experiment analyses the necessary effort of an impostorto attack another system using a stolen protected template and its respective key, i.e, a crossmatch attack. This attack is based on invertible protection methods such as cryptosystemsor salting methods. In case of use non-invertible functions, the attack is usually a templateinjection. Assume Systems A and B use the same multi-privacy biometric protection schemeusing n cancelable functions (see Chapter 6). Thus, we can predict the necessary effort toattack another system with the possession of the n protected templates and its respectivekeys.

The Table 9 shows the effort in points that an attacker needs to spend to perform across match attack. We can infer that the total of effort points to perform a cross match attackin a Heterogeneous Key system is higher than Homogeneous Key. However, the difference is


Tabela 9 – Cross Match Effort Points - shows the effort in points which an attacker needs tospend to invade a biometric system protected by MPPS

Action HomogeneousKey

HeterogeneousKey

Steal the template from System A 1 1Steal the key from System A 1 1.5Reconstruct the templatebased on stolen key 1 1

Steal the key used in System B 1 1.5Total of Effort Points n× 4 n× 5

little with the use of few n cancelable methods(n× 4 against n× 5). We can infer in generalterms that this assumption is true because the number of cancelable methods in the literatureis low.

The difference of effort points in Heterogeneous and Homogeneous scenarios is littlebecause the attacker needs to discover the user id of the victim to obtain the right key. Thus,for Heterogeneous case, the stealing key actions have an increase of 0.5 score point.

Cross-Match Attack happens when is possible to reconstruct the biometric featuresfrom a protected template. In summary, its possible to say that MPPS using Homogeneous orHeterogeneous Key and non-invertible methods is not sensible to cross match attack becausethese schemes use non-invertible functions as template protection methods. Moreover, basedon Table 9, we can infer that the effort to cross match attack in both key type scenarios arenot so different in MPPS. Therefore, our proposal to use a single key in Multi-ProtectionSchemes has almost the same security than MPPS using Heterogeneous scenario in crossmatch attacks.

Next section will evaluate the key knowledge scenarios and compare them with baseline(unprotected template). Equal Error Rate (EER), FAR and FRR values and DET Curves areused to analyse the security of the biometric system under the different scenarios.

7.7 Baseline Performance (unprotected samples) vs. Attack Scenarios

This section presents a comparison of SVM classifier using baseline (unprotectedsamples) and four different attack scenarios. The purpose of this section is to discuss theperformance of biometric systems using cancelable templates under Know and UnknownKey Attacks and compare them with the performance achieved by the same system using

7.7. Baseline Performance (unprotected samples) vs. Attack Scenarios 113

unprotected biometric template.

The findings are discussed based on DET curves and Equal Error Rate (EER) observedin these curves. DET curves show the classifier performance along different acceptancethresholds. Therefore, DET curves allow a performance visualization of the evaluated scenarios.

The comparison of the similarity or dissimilarity was performed by two-sampleKolmogorov-Smirnov statistical test. This test evaluates when two sets belongs to the samecontinuous distribution. Therefore, all the sentences that says the two set of values are similar,better or worst are based on Kolmogorov-Smirnov statistical test with 5% of significance level.

The Tables 10 and 11 present EER values organized by scrolling and horizontal strokesrespectively. The rows present the four used cancelable methods, while the columns showthe Baseline and the four different scenarios ( Homogeneous Known Key (Homo_K_Key),Heterogeneous Known Key (Hete_K_Key), Homogeneous Unknown Key (Homo_Un_Key)and Heterogeneous Unknown Key (Hete_Un_Key)). The light gray and gray cells represent,respectively, the results statistically better and worst than baseline.

Tabela 10 – EER by scenario and Template Protection Method using Scrolling Strokes classi-fied by SVM

Baseline Homo_K_Key

Hete_K_Key

Homo_U_Key

Hete_U_Key

Interpolation 0.1310 0.1340 0.1423 0.0573 0.0637BioHashing 0.1310 0.1553 0.1460 0.0647 0.0677BioConvolving 0.1310 0.2104 0.2120 0.1471 0.1393

DoubleSum 0.1310 0.1457 0.1494 0.0686 0.0707

Tabela 11 – EER by scenario and Template Protection Method using Horizontal Strokesclassified by SVM

Baseline Homo_K_Key

Hete_K_Key

Homo_U_Key

Hete_U_Key

Interpolation 0.1345 0.1460 0.1248 0.0391 0.0435BioHashing 0.1345 0.1572 0.1487 0.0774 0.0700BioConvolving 0.1345 0.1661 0.1814 0.1207 0.1154

DoubleSum 0.1345 0.1480 0.1499 0.0568 0.0632

A SVM classifier using horizontal strokes in Unknown Key Attacks presents statisticalbetter EER values in all evaluated cancelable methods. However, using the scrolling strokes,


Interpolation, BioHashing and Double Sum are statistical better (see light grey cells inTables 10 and 11). The EER of BioConvolving under Unknown Key Attacks has worststatistical result using scrolling strokes (gray cell) but using horizontal strokes, BioConvolvingperformance is statistical better. These results suggests that a template protection systemunder Unknown Key Attack has 85% better statistical results than a biometric system withouttemplate protection. Therefore, a cancelable template increases the security and user privacy.

A SVM classifier has a statistical worst or better security performance than baselinesystem using cancelable template and an impostor template encoded using the key of agenuine user. The performance of the SVM classifier under Known Key attacks presentsstatistical higher EER values than baseline values, except in Interpolation in Scenario 2 (lightgrey cell in Hete_K_Key in Table 11). The gray cells highlight these values in the columnsHomo_K_Key and Hete_K_Key in Tables 10 and 11.

Considering a single point (EER), Known Key performance is similar with baselineperformance. The statistical test informs that two sets of FAR and FRR belongs to the samecontinuous distribution. If they do not come from the same distribution, they are statisticallydifferent. However, the absolute difference of baseline and Known Key EER is not so high.Thus, it is possible to use a protected template because a SVM can classify an impostor withalmost the same rate than using unprotected templates.

An biometric system whose templates are protected using cancelable functions providesall the security advantages of template protection methods without so much performance loss.In Unknown Key Attacks, we observed an increasing performance than baseline. However,based on EER, the performance of Known Key Attacks are similar with baseline. In summary,the performance of a biometric system is not highly affect when an attacker knows the userkey and improves when an attacker does not know it.

The second part of this section compares the different cancelable functions underthe key knowledge attacks using DET curves under different acceptance thresholds. TheFigures 18a, 19a, 20a, 21a represent the DET curves of baseline, Known and Unknown KeyAttacks using scrolling strokes protected by Interpolation, BioHashing, BioConvolving andDoubleSum cancelable methods respectively. In addition, the Figures 18b, 19b, 20b, 21brepresent the same DET curves but using horizontal strokes.

7.7.1 Interpolation

The Interpolation protected data has better performance in the cases when the impostordoes not know the user key (standard case) in Heterogeneous and Homogeneous scenarios(Figures 18a and 18b). However, the performance of Interpolation when the impostor knows


the user key is statistically worst than baseline (unprotected data), except using HeterogeneousKnown Key. In other words, an impostor has a slightly better chance to be authenticatedimproperly with template protection. These evidences show that a template protected byInterpolation method provides higher (Unknown Key scenario) or almost similar (KnowKey scenario) performance than unprotected samples (baseline) in all key knowledge attackscenarios using both stroke directions (horizontal and scrolling).

(a)

(b)

Figura 18 – DET curve of all scenarios protected with Interpolation transformation function.(a) Scrolling Strokes; (b) Horizontal Strokes.


7.7.2 BioHashing

The Figures 19a and 19b show the SVM performance using scrolling and horizontalBioHashing strokes, respectively. Both plots show that the SVM classifier provides statisticallybetter performance than baseline when the impostor does not know then key.

The biometric system which uses BioHashing templates under a Known Key Attackshows a low statistical performance than baseline performance. In the other words, an impostorwith a user key has a little more chance to enter improperly the system. However, a systemthat uses template protection is preferable. The low performance of BioHashing when theimpostor knows the key has reported in (42). Therefore, our experiments are compatible withthe literature.

7.7.3 BioConvolving

SVM performance using BioConvolving protected data is represented in the Figures 20aand 20b. When the FAR is under 5%, the SVM classifier provides better performance thanbaseline in the Unknown Key Attacks in scrolling strokes (Figure 20a). Over this value thesecurity performance becomes little worst or similar with baseline performance. Therefore,BioConvolving protected template under Unknown Key Attack offer better/similar or worstperformance in some FAR points than baseline performance using scrolling strokes. However,using horizontal strokes we see that BioConvolving has better performance than baseline forall FAR/FRR values (Figure 20b).

In contrast, all the points of the unknown curves generated using the BioConvolvinghorizontal strokes show less FRR than baseline (see Figure 20b). Therefore, the performanceof biometric system using horizontal BioConvolving data under standard invasion attack(Unknown Key attack) presents better security/performance than a biometric system withouttemplate protection.

All the analysed cancelable methods, the Unknown Key attack scenarios shows abetter performance than baseline. However, in BioConvolving using scrolling strokes thisbehaviour has not seen. Baseline curve decrease and touch the unknown curve at 5% FARand cross it at 7%, keeping under the unknown curves all the time. This behaviour meansthat the baseline curve has inferior performance in FAR values under 5% than Unknown KeyAttacks curves. This finding shows that the designer can protect the template when low FARvalues without loss of performance is desirable. However, in superior FAR values the baselineperforms better.

In Know Key attacks, the system shows worst (scrolling and horizontal strokes) securityperformance when comparable with baseline, Figures 20a and 20b. The EER of BioConvolving


(a)

(b)

Figura 19 – DET curve of all scenarios protected with BioHashing transformation function.(a) Scrolling Strokes; (b) Horizontal Strokes.

using scrolling strokes under Known Key Attack is around 20%, whereas the baseline is 15%(see Figure 20a). In other words, an impostor with a user key has, in general, 5% more chanceto enter improperly the system without template protection. However, a biometric systemmust protect the user privacy. In summary, a biometric system using templates protected byBioConvolving method provides a slightly worst or better performance in both key knowledgeattack scenarios.


(a)

(b)

Figura 20 – DET curve of all scenarios protected with BioConvolving transformation function.(a) Scrolling Strokes; (b) Horizontal Strokes.

7.7.4 Double Sum

The Double Sum performance is illustrated in the Figures 21a and 21b. We can inferobserving these figures that Known Key Attack has similar/worst performance than baseline.In addition, all Unknown Key scenarios have better performance than baseline.

Double Sum under Unknown Key Attack scenarios has the second best result, Interpo-lation has the first place. Thus, a simple method can provide non-invertible and revocabletemplates and provides a good security against invasion attacks. Thus, biometric systems


that use Double Sum protected templates show a similar performance found in the previousanalysed cancelable methods (Interpolation, BioHashing and BioConvolving). In summary, itwas observed a increase in the performance using Double Sum in Unknown Key attacks and asimilar with baseline in Known Key attacks.

(a)

(b)

Figura 21 – DET curve of all scenarios protected with DoubleSum transformation function.(a) Scrolling Strokes; (b) Horizontal Strokes.


7.8 Comparison of the Security/Performance of the Biometric Systemusing Homogeneous and Heterogeneous Key Scenario

A non-invertible protected template does not provide partial or full user identity whenan impostor has the protected template and the user key. This feature is important to theuser privacy because it protects the user identity and specific biometric features even if thesecurity of a biometric system is compromised.

Cancelable templates are revocable, i.e., the system can generate a new template basedon a new user key. Thus, a cancelable function generates a new protected template fromthe same biometric features using a different user key. Good cancelable functions providesdisconnected protected templates when two different user keys are used.

Most of the cancelable biometrics assumes the use of a single key for each user. Asdiscussed before, the use of a user-specific key brings some disadvantages such as matchingalgorithm biased in the user key and the exigency to submit, store and remember the key.Therefore, the use of user specific keys brings usability and classification disadvantages.

The MPPS approach decrase biometric system usability because it requires that auser provides n keys for n cancelable functions (see Chapter 6). Consequently, the user needsto remember or carry n different keys or tokens/smartcards. Thus, an interesting feature toMPPS is allow a user to enrol or match his template without a key. On way to achieve this isto use a single key for all the users for each cancelable method. Therefore, the use of a singlekey is a crucial usability feature of MPPS.

The security of a unique system key must be evaluated. We showed in the lastSection 7.7 that the user key has a crucial role in the security performance in cancelablefunctions. Unfortunately, the Known Key Attacks impacts the recognition performance. Thus,the use of single key in MPPS must be discussed in order to provide enough arguments to itsadoption.

This section analyses and discusses the use of a unique key for each cancelablemethod against the use of different keys. The evaluation is based on the similarity betweenHomogeneous and Heterogeneous matching scores. The Kolmogorov-Smirnov statistical testevaluates the similarity of the different matching scores. Therefore, the statistical test check ifthe two set of matching scores are statistically the same.

The Tables 12 and 13 show the statistical test for Known Key Attack and Unknown KeyAttack respectively. The rows represent the stroke direction and the columns the cancelablefunction. Each cell content represents what key type has a better statistical result. In KnownKey Attacks, the Homogeneous Key has 5 better statistical test and 3 Heterogeneous scenarios

7.8. Comparison of the Security/Performance of the Biometric System using Homogeneous andHeterogeneous Key Scenario 121

(Table 12). Unknown Key Attacks presents the same proportion of better statistical test. Fivebetter statistical results for Homogeneous scenarios and 3 for Heterogeneous scenarios. Insummary, Homogeneous keys have 40% more chances to success in all evaluated key typescenarios.

Tabela 12 – Heterogeneous vs. Homogeneous Statistical Test using Known Key Attack Scores

Interpolation BioHashing BioConvolving DoubleSumHorizontal Heterogeneous Heterogeneous Homogeneous HomogeneousScrolling Homogeneous Heterogeneous Homogeneous Homogeneous

Tabela 13 – Heterogeneous vs. Homogeneous Statistical Test using Unknown Key AttackScores

Interpolation BioHashing BioConvolving DoubleSumHorizontal Homogeneous Heterogeneous Heterogeneous HomogeneousScrolling Homogeneous Homogeneous Heterogeneous Homogeneous

7.8.1 Interpolation

Homogeneous and Heterogeneous scenarios have similar curve behaviour using inter-polated data (Figures 18a and 18b). The curves of Homogeneous and Heterogeneous keyunder Known Key Attack have similar difference along the curve using scrolling strokes (seeFigure 18a). However, using horizontal strokes (Figure 18b), we see that the difference inlow FAR are higher but the difference tends to decrease and stabilize when FAR increases.Therefore, the distance between the Homogeneous and Heterogeneous curves under KnownKey Attacks is little for both stroke directions.

The statistical test shows that Homogeneous scenario has statistical better performanceusing scrolling strokes. However, Heterogeneous scenarios have better statistical result inhorizontal strokes. Therefore, Homogeneous and Heterogeneous under Known Key Attacksvaries its security performance due to stroke direction.

Known Key Attack using horizontal strokes showed better result in Heterogeneousscenario. However, the difference of Heterogeneous and Homogeneous performance is not somuch. Thus, a biometric system developer must deal with the usability/performance trade-offwhen implementing the interpolation method.

Homogeneous and Heterogeneous DET curves have similar behaviour under UnknownKey attack ( Figures 18b and 18b). However, the Homogeneous curves are most of time under


Heterogeneous curves, in both stroke directions. Therefore, the Homogeneous curve showlittle better performance than Heterogeneous curve.

The statistical test shows that Homogeneous scenarios has better statistical perfor-mance than Heterogeneous scenarios using both stroke directions. Therefore, it is possible touse a unique system key in Unknown Key scenarios.

In summary. the performance is not highly affect due to fact of an impostor has anavailable user key. Consequently, the key can be unique and stored in a plain way. There-fore, considering the system usability, we can infer that biometric systems that implementsInterpolation method can use Homogeneous key.

7.8.2 BioHashing

The Figures 19a and 19b shows the DET curves of BioHashing scenarios. The KnownKey attack curves (Homogeneous and Heterogeneous Key) using scrolling strokes have a stableperformance behaviour along the different FAR values (see Figure 19a). However, in horizontalstrokes, the difference between Homogeneous and Heterogeneous Key has a considerableincrease in low or high FAR values, see Figure 19b. Both of Known Key Attack curves areabove baseline curve.

The statistical test showed that the Heterogeneous scenarios are statistic better thanHomogeneous scenarios. In summary, in both stroke directions the Heterogeneous Key underKnown Key attack has statistical better performance than Homogeneous scenario.

In contrast, the difference between Homogeneous and Heterogeneous Key curves underUnknown Key scenarios are not so relevant. Throughout the curves, for both stroke direction,the curves are almost the same (see Figures 19a and 19b). However, the statistical testshowed that the set of scores are not similar. Heterogeneous key scenario has better statisticalperformance using horizontal strokes and Homogeneous Key scenario using scrolling strokes.

Considering the DET curves, the BioHashing performance is similar to baseline but thestatistical test showed the contrary. In other words, the biometric system designer must choicewhat key type scenario to use in BioHashing considering usability vs. security performance.Fortunately, the security performance of the two key types is not so different.

7.8.3 BioConvolving

The BioConvoling does not present good EER values than Interpolation and BioHashingbut it presents some interesting DET curves when the key knowledge attack (Known Key vs.Unknown Key) is considered. Using scrolling strokes, the Know Key curves (Homogeneous

7.8. Comparison of the Security/Performance of the Biometric System using Homogeneous andHeterogeneous Key Scenario 123

and Heterogeneous) presents so much similarity (see Figure 20a). However, the horizontalstrokes curves does not show affinity, Figure 20b.

The statistical test showed that the Homogeneous Key is statistically better thanHeterogeneous in both stroke directions. Thus, under Known Key Attack, it is possible to usea unique key in biometric systems using the BioConvolving method.

In Unknown Key Attack, the performance of Homogeneous key, Heterogeneous Key andBaseline case is graphically similar. Using scrolling strokes, a similar performance along theHomogeneous and Heterogeneous curves under Unknown Key attack is observed (Figure 20a).However, the statistical test showed that the Heterogeneous scenario is statistically betterthan Homogeneous in both stroke directions. Therefore, the system developer needs to balancethe trade-off of usability vs. security performance.

In conclusion, it is possible to use a unique key in BioConvolving cancelable method.Under Known Key Attack, the Homogeneous Key scenarios shows a better statistical result.On the other hand, in the Unknown Key Attacks, the Heterogeneous Key shows a betterperformance.

However, the difference between Homogeneous and Heterogeneous curves under KnownKey Attacks is not so high. Therefore, we can consider to use a unique key in BioConvolvingmethod as a good solution for usability issues presented in MPPS because it presents slightlybetter performance than Heterogeneous case. Again, the designer needs to balance the trade-offusability vs. performance.

7.8.4 DoubleSum

Double Sum presents the same characteristics of the analysed cancelable functions.Known Key attacks around baseline curve and Unknown Key Attacks under baseline. Thus,the DET curves of all evaluated scenarios have the same behaviour.

Double Sum curves under Known Key attack using both stroke directions presents oneof the best results among the analysed cancelable functions (see Figures 21a and 21b). Homo-geneous and Heterogeneous curves are almost together along the curves. Thus, Homogeneousand Heterogeneous Key under Known Key scenarios have similar performance.

The same conclusion can be inferred using the Unknown Key attack curves (seeFigures 21a and 21b). FAR/FRR values are the same in most of curve segments. Therefore,both key types (Homogeneous or Heterogeneous) have a similar performance. The curvesusing scrolling strokes have the same values when they are between 3% and 40% FAR values,Figure 21a. However, using horizontal strokes we see a similar performance under FAR of 2%


and between 14% and 30% of FAR, Figure 21b.

The statistical test showed that Homogeneous Key scenarios have better statisticalresults in all the key knowledge attacks using both stroke directions. Consequently, it ispossible to use a unique key in a biometric system using Double Sum method as templateprotection without performance lose compared with Heterogeneous Key scenario. Therefore,a biometric system can use a single key for all the users when using Double Sum cancelablemethod.

7.9 MPPS and the use of a single key

The Multi-Privacy Protection Scheme (MPPS) requires the use of a particular key foreach cancelable function. As discussed before, this requirement decreases the system usability.Thus, the use of a single key by the system help the usability of large-scale use of MPPS.

The performance observed in Homogeneous Key under Known and Unknown KeyAttacks shows that is possible to use a single key for all users.Even in the cases when theHeterogeneous Key showed better statistical result, the DET curves showed that the differenceof security performance between Homogeneous and Heterogeneous scenarios is little. Asconsequence, a user does not need to present a key to the system, because the system cangenerate and store the user key during the user enrolment.

Based on these experiments, we can observe that the key storage method (Homogeneous/ Heterogeneous) does not have a strong role in the security performance. As observed, theHomogeneous and Heterogeneous Key presented similar performance under key knowledgeattacks. Consequently, it is possible to implement a MPPS system where a user does not needto remember different keys or carry different devices to be authenticated.


In conclusion, the performance of biometric systems under Known Key Attack usinghorizontal and scrolling strokes show statistically worst or similar performance than baseline.However, Interpolation using horizontal strokes under Heterogeneous Key attacks shows abetter performance than baseline. Furthermore, Interpolation, BioHashing and DoubleSumperformed statistically better than baseline under Unknown Key Attacks. However, BioCon-volving shows a worst performance than baseline using scrolling strokes. Therefore, consideringonly the EER, all the analysed cancelable methods have a performance around or better thanbaseline performance.


The user key has a relevant impact on the similarity of two different user templates.We see that an impostor encoding a biometric sample with a leaked user key, providesperformance similar or worst than baseline (without use of template protection). In otherwords, an impostor biometric sample is similar with a genuine user template when the impostoruses the user key than a random key. Therefore, the key influence the recognition power ofthe biometric matching process.

The recognition performance was not so low in scenarios under Known Key attack.We expected worst performance due to distortions and the impact of the Known Key butat least we observed a performance just a little lower compared with baseline performance.Therefore, to use template protection with non-invertible function is as accurate as the use ofa biometric system without template protection.

In general, the literature reports higher Equal Error Rates when a biometric systemuse template protection than the use of non-protected biometric samples (48, 66). In otherwords, the cancelable protection methods decrease the system security due to the undesirabledistortions included by non-invertible functions. Thus, the performance of our Unknown Keyscenarios is better than baseline performance. Maybe the reason of the literature achievedworst result was to evaluate the system under decision level. However, our experiments usethe classifier scores to label an individual. Therefore, cancelable functions protects the useridentity and increase the security performance under standard invasion cases (Unknown Keyscenarios).

Our experiments based on the touch screen biometrics entails a number of importantfindings. First, the Homogeneous Key storage has the same level of security than the He-terogeneous Key storage. This finding is significant because the key has to be stored in adifferent device or provided by the user in the Heterogeneous Key storage. Thus, the key canbe shared, lost or stolen easily. However, using Homogeneous Key storage, the key can besecurely stored in the mobile device. Consequently, if the device is stolen, the system has thesecurity level little worst than the biometric system which works with unprotected template.In addition, in the case of stolen device, the template protected by cancelable functions is notcompromised. Therefore, the user privacy is maintained.

Second, the key knowledge has a significant role in the system performance (46, 56).We discussed two key knowledge attacks for each evaluated cancelable function. The KnownKey experiments illustrates the case when an attacker knows the user key and tries to enterthe system using his biometric template protected by the key of the genuine user. TheUnknown Key experiments evaluate the recognition performance when an attacker triesto encode his biometric template using a guessed key. We observed that the recognition


performance decreases when the attacker knows the key. However, the experiments showedthat the performance of know key attacks is comparable with biometric system performanceusing unprotected template. Therefore, the key has an impact in the security performancebut in the worst case (Known Key Attack) the performance has almost the same securitylevel of biometric system using unprotected templates.

The experiments show that the key knowledge increase the success rate of the impostor.We observed that the key storage is not a crucial issue because the Homogeneous andHeterogeneous Key scenarios have similar performance under both key knowledge attacks.

In summary, the Homogeneous Key experiments showed that it is possible to use aunique key for all the users. However, in some cases the Heterogeneous Key scenarios showedbetter results but similar with Homogeneous Key performance. In addition, we assume thatthe classifier in the Homogeneous Key experiments is not biased in the key but in the userbiometric features. This finding is relevant because the biometric matching needs to considerbiometric feature information instead of additional data (user key).

127

8 Impact of Key Length in Cancelable Functi-ons

8.1 Introduction

An impostor with a genuine key increases the invasion rate compared with an attackwithout a valid user key. However, the performance of Known Key Attack is comparable tothe performance of a biometric system using unprotected biometric samples. Therefore, acompromised user key affects the security and the performance of biometric systems.

BioConvolving authors showed that the performance decreases when the number of Wsegments increases (W = number of division of signature data), increases (49, 50). The authorsexplain this phenomenon based on alignment problem. Different sequences extracted fromdifferent signatures from the same user typically have different lengths causing an alignmentproblem. Therefore, the increase of length of the user key affects BioConvolving performance.

BioHashing is able to achieve 0% of EER for dimensions above 100 (42). The perfor-mance of BioHashing has a direct relation with the biometric sample dimension (42). In otherwords, the BioHashing performance is proportional to the number of template dimensions.

The BioHashing performance is dependent of the template dimension. A set of solutionsto overcome this limitation are proposed in (46): (1) Normalization: to normalize the biometricfeatures before to apply BioHashing procedure; (2) Threshold Variation: use of variousthresholds to binarize the BioHashing Code; (3) Space Augmentation: the BioHashing methodis interacted k times on the same biometric sample in order to obtain k bit-vectors bi; (4)Feature Permutation: Use several q permutations of the biometric sample in order to generateq bit vectors. The Space Augmentation and Feature Permutation fuse the matching scoresusing a SUM rule. These BioHashing procedures become computationally longer and complex.Although such methods creates new bit vectors in order to increase the BioHashing key, ourmethod increases the key length creating vectors with more bits than original BioHashingmethod.

The key length brings good (BioHashing) and bad (BioConvolving) consequences. Theincrease of length of the user key does not guarantee a performance increasing. Therefore, itis important to evaluate the key length in cancelable functions because the user key lengthinfluences the performance.

Unfortunately, the literature does not present solutions to increase the length of

128 Capítulo 8. Impact of Key Length in Cancelable Functions

Interpolation and Double Sum (66). Therefore, one of our contributions is to define a newmethod to increase the key length to Interpolation and Double Sum and to evaluate theconsequences of increasing the user key.

Another rarely addressed issue relates to how to increase the key length. For somecancelable functions, the key is actually a seed which it is given to a pseudo number generatorwhich in turns, produces a set of numbers. Based on the seed, the length can be specified.When the key length is long enough and is bound to the biometric sample via its cancelablefunction, it can be argued that the resultant protected biometric template is dominated bythe key rather than the biometric sample. This is to say, if one were able to measure theinformation content (via information theory), it could be reasonably postulated that theinformation of the protected biometric template would eventually be dominated by the keyrather than by the biometric template. A possible consequence of this is that when the key iscompromised, the cancelable function that relies on a longer key would has a more severeconsequences than the one with a shorter key. However, the issues of key length are rarelydiscussed and evaluated in the literature, especially under the compromised key scenario. It isalso unknown whether this behaviour is systematic across a number of cancelable functions.In short, the literature does not fully comprehend the impact of key length on cancelablefunctions when the user key is or is not compromised.

One of the goals of this chapter is analyse the performance of the classifier under keyknowledge attacks (Heterogeneous and Homogeneous Known Key Attack) using different keylengths. Unfortunately, the Known Key Attacks showed worst statistical result than baseline.Thus, we hope that the increase of the length of the key improves the performance of classifierunder Known Key Attacks.

The increase of the length of the user key brings consequences. The consequences canbe a classifier biased in the key, a protected template dominated by the key or a increase ofsecurity performance. Thus, this chapter tries to increase the understanding of this importantapproach.

8.1.1 Findings

The length of the user key in a cancelable function has a significant impact on therecognition performance. Since not all existing cancelable functions can increase its key length,we have introduced some modifications to these functions in order to be able to do so. Thenew methods to key generation allow that a cancelable function increases its key length adinfinitum.

Our experimental results suggest that Interpolation, BioHashing and Double Sum

8.2. Methods to Increase the Key Length 129

benefits from the increased key length under the Known Key attack scenario. BioHashingconsiderately increases the recognition performance under all scenarios. Unfortunately, theincrease of the key length decreases the recognition performance of biometric systems usingBioConvolving (47). All these findings show that the key length definitely has a significantimpact on biometric systems using cancelable functions.

8.2 Methods to Increase the Key Length

In this section we will present the methods to increase the length of the user key inInterpolation, BioHashing, BioConvolving and Double Sum. Each cancelable function has itsspecific increasing method due to its specific characteristics.

The methods to increase the key of Interpolation, BioHashing and Double Sum isbased on the addition of random values in the user key. Thus, a user key can has an infinitumlength. On the contrary, the length of the key in BioConvolving is limited on the dimensionof biometric sample.

8.2.1 Interpolation

The Interpolation method is based on the polynomial interpolation of biometric featurevalues. Thus, a polynomial p summarises the biometric sample. Consequently, a similar butnot equal protected biometric sample is obtained applying a vector (key) k = [1, . . . , n] in thepolynomial p(k).

The interpolation key length can be increased just augmenting the number of thevalues in the key vector k. Thus, its length can be infinite. The standard length of k is equalto the number of features of the biometric sample. The standard length produces a protectedtemplate |p(k)| = |xi| with the same length of the biometric sample. The method to increasethe length of the key is described in Algorithm 1.

input : seed, lengthOfKeyoutput : InterpolationKey with the length specified by lengthOfKey

InterpolationKey =randomVector(seed,lengthOfKey);Algorithm 1: Algorithm to Increase the Length of the User Key using InterpolationFunction.

Increasing the length of the Interpolation key increases the number of features in theprotected biometric template. Thus, the length of key is increased based on new randomnumbers added to the vector k.


8.2.2 BioHashing

The BioHashing template is generated by the inner product between the biometrictemplate and a set of m orthonormal pseudo-random vectors. In this case, the BioHashing keyis the m orthornormal pseudo-random vectors. In standard case, each orthornormal vectorhas the same number of features of the biometric sample.

Lumini and Nanni (46) propose to use more projections spaces and features permutationto increase the hash code. Consequently, the authors create more bit-vectors than the standardmethod. Thus, the method proposed in (46) creates more bit-vectors per biometric templateinstead of one bit-vector per biometric sample. On the contrary, our method generates morebits in the vector instead of to create more bit vectors.

Our method increases the number of values of the m orthonormal random vectors inorder to create more feature values. The feature values are created after the inner productbetween the biometric sample and random vectors. We still keep the proportion of onebit-vector per biometric sample but the resultant bit-vector has more bits than the originalBioHashing method. The method to increase the user key is represented by the Algorithm 2.

input : LengthofKeyoutput :BioHashingKey, m orthorormal vectors with dimension LengthofKey

randomMatrix stores m random vectors with dimension LengthofKey;for i← 1 to m do

randomMatrix [i,]=randomVector(LengthofKey);endBioHashingKey =generatesOrthornomalVectors(randomMatrix);

Algorithm 2: Algorithm to Increase the Length of the User Key using BioHashingFunction.

The length of the BioHashing key is infinite in our proposed method. As described, thenumber of feature in the orthonormal vectors can be increased without affect the biometricsample. Therefore, the protected template can has infinite number of features based on afinite number of biometric features.

8.2.3 BioConvolving

The key in BioConvolving has the role to select the positions where the biometric datais sliced. After, the convolution process is applied in the different data segments. Hence, themaximum number of the segments which a biometric sample can be divided is the dimensionof the biometric template. Therefore, the length of the key affects the number of the datasegments.


The length of the BioConvolving key is finite. Thus, the number of biometric featureslimits the number of data segments. Therefore, the maximum length of BioConvolving key isnumber of biometric features minus one.

8.2.4 Double Sum

The key in DoubleSum selects what feature values will be used to generate the newprotected feature values. As discussed before, the DoubleSum key is composed by two randomvectors with the same number of features present in the biometric sample. Thus, the standardlength of the key in DoubleSum is the number of biometric features.

A way to increase the Double Sum key is increasing the number of elements of thetwo random vectors. As described, the random vectors selects the features values used in thefeature value creation. Increasing the length of the key will create new feature values based onthe same biometric sample. The length of the Double Sum key can be infinite because eachvalue of the key just selects two biometric feature values. After the Double Sum functions willsum these two feature values in order to create a new feature value of the protected biometrictemplate. The Algorithm 3 represents our method to create a Double Sum key with infinitevalues

input : LengthofKeyoutput :DoubleSumKey, with dimension LengthofKey

RandomMatrix receives 2 random vectors with dimension LengthofKey for i← 1 to 2do

RandomMatrix [i,]=randomVector(LengthofKey);endDoubleSumKey =RandomMatrix;

Algorithm 3: Algorithm to Increase the Length of the User Key using Double SumFunction.

8.3 Results and DiscussionThe experiments are divided by cancelable method and attack scenarios. The evalu-

ation of the experiments uses DET curves to illustrate the impact of the key length. TheDET curves for each attack scenario and cancelable method are draw by key length. Eachscenario/cancelable method DET plot presents 6 different key lengths plus baseline (standardkey length). It is possible to observe the different performance among the different key lengths.The Figures 22, 24 and 28 illustrate the DET curves for different key lengths of the KnowKey attack using Interpolation, BioHashing, BioConvolving and Double Sum respectively. In


addition, the Figures 23, 25 and 29 illustrates the same curves for Unknown Key Attack forInterpolation, BioHashing, BioConvolving and Double Sum respectively.

8.3.1 Interpolation

The Figure 22 shows the DET curves using Interpolation data under Known Keyattack. Increasing the key on Heterogeneous Key scenario under Known Key Attack doesnot increase considerably the recognition performance(Figures 22a and 22c). However, usingthe Homogeneous Key we see a small improvement when the key increases (see Figures 22band 22d). In addition, when the length of the interpolation key increases, the system usinghorizontal strokes shows a better improvement in performance than scrolling strokes.

The recognition performance when the length of the key is 400 is comparable withstandard key length, key length=25 when Heterogeneous Key using horizontal strokes isobserved (Figure 22c). However, the performance of classifier in the Homogeneous Key attacksurpass the baseline with key length=200 (Figure 22d). Therefore, the key length has animpact in the recognition performance and this performance depends of the biometric data.

The Figure 23 shows the Interpolation DET curves using scrolling and horizontalstrokes varying the key length under Unknown Key attack. Unfortunately, we can not see aslight difference in the recognition performance when the key increases over 50 in Unknown KeyAttack scenarios (Figure 23). In addition, we see that any DET curve with key length greaterthan 25 surpass the standard length (key length=25) in FAR values under 3%. Therefore,increasing the key length over 50 does not have a high effect in the Interpolation performanceunder Unknown Key attack. However, we suggest to apply the length=50 because it shows aslight improvement in low FAR values.

In conclusion, we suggest that Interpolation method use the key length=50 due toeffective improvement in low FAR values in Unknown Key Attacks. In addition, under KnownKey attack scenarios, this length presents a comparable performance with the standard keylength. Therefore, in our experiments the interpolation key length=50 has showed ideal forTouchalytics dataset. In other words, the ideal length found is 2 times bigger than standardlength.

8.3.2 BioHashing

The Figure 24 shows the BioHashing experiments varying the user key length underKnown Key attack. Increasing the user key in the Heterogeneous Key scenario under KnownKey Attack has a good effect in the recognition performance. When the key length is justdoubled from 25 to 50 the difference is perceptive in horizontal strokes (Figures 24c and 24d)


(a) a (b)

(c) (d)

Figura 22 – DET curves for Interpolation - Known Key Attack. (a) Heterogeneous - ScrollingStrokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous - HorizontalStrokes; (d) Homogeneous - Horizontal Strokes.

and in Homogeneous Key using scrolling strokes (Figure 24b). Therefore, the key lengthexperiments shows that a increase of BioHashing key augments the performance of theclassifier.

In Unknown Key Attack scenarios, the improvement in the recognition performance isperceptive. The Figure 25 shows the DET plots of Unknown Key scenarios using Homogeneousand Heterogeneous Key for both stroke directions. It is possible to see that whereas thekey increases the performance improves as well. Therefore, a increase of the length of theBioHashing key impact positively in the recognition performance.

In summary, our experiments suggest that our proposal to increase the BioHashingkey has good consequences in the classifier performance for both key knowledge attacks. Wesuggest the use of length = 100 because increasing the length of the key has impact in the


(a) (b)

(c) (d)

Figura 23 – DET curves for Interpolation - Unknown Key Attack. (a) Heterogeneous - ScrollingStrokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous - HorizontalStrokes; (d) Homogeneous - Horizontal Strokes.

processing time. As we saw the length=100 surpass the baseline performance under KnownKey Attacks and still provides a reasonable performance in Unknown Key Attacks. Therefore,we suggest a key length 4 times bigger than standard length.

Lumini and Nanni (46) also shows that increasing the hash code (biohashing template)the performance increases. Unfortunately, it is not possible to compare our result with thembecause Lumini and Nanni (46) use a different biometric modality.

8.3.3 BioConvolving

The BioConvolving recognition performance of different key lengths under Known andUnknown Key Attacks are illustrated respectively in the Figures 26 and 27. Using these figuresis possible to see that the recognition decreases when the number of segments w increases


(a) (b)

(c) (d)

Figura 24 – DET curves for BioHashing - Known Key Attack. (a) Heterogeneous - ScrollingStrokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous - HorizontalStrokes; (d) Homogeneous - Horizontal Strokes.

Our findings are consistent with the literature. Maiorana et al. (49) shows the same resultfor Heterogeneous Key under Unknown Key Attacks for signature recognition.

In summary, we suggest that the number of segments w equals to 2 for the BioConvol-ving method. This key length i.e. number of segments, provides the best result under all theKnown and Unknown Key Attacks. Therefore, as reported in the literature, the performanceof the BioConvolving is not proportional to the key length.

8.3.4 DoubleSum

The biometric system Double Sum under Known Key Attack improves or keep theperformance around the standard length (key length=25) when the key increases (Figure 28).However, it is not possible to identify a relation between the recognition performance and the


(a) (b)

(c) (d)

Figura 25 – DET curves for BioHashing - Unknown Key Attack. (a) Heterogeneous - ScrollingStrokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous - HorizontalStrokes; (d) Homogeneous - Horizontal Strokes.

key length (Figure 28c and 28c).

Homogeneous and Heterogeneous scenarios present good recognition performance whenthe key length is equals to 75 and 100 (see Figures 28a and 29a). However, when the length ofthe key increases to 200 or 400 the performance decreases and it becomes the worst result (seeFigures 28c and 29c). Therefore, the key length does not have a full relation with performance,as observed in the key lengths 200 and 400.

The Figure 29 shows the DoubleSum experiments differing the key length underUnknown Key attack. The plots of Heterogeneous (Figure 29a and 29c) and Homogeneous Keyscenarios (Figure 29b and 29d) show that increasing the length of the key increases or keepsthe performance around the standard key length, key length=25. Similarly to Known KeyAttacks, we can not observe a full relation between the length of the key and the performance


(a) (b)

(c) (d)

Figura 26 – DET curves for BioConvolving - Known Key Attack. (a) Heterogeneous - ScrollingStrokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous - HorizontalStrokes; (d) Homogeneous - Horizontal Strokes.

of Unknown Key scenarios.

One of possible reasons for the performance decreasing is the high dimension of thebiometric template. Some machine learning algorithms decrease its recognition performanceand increase its processing time when the data has a high dimensionality (20). Therefore, theunstable performance among the analysed key lengths can be exposed by this phenomenon.

Some classifiers have problems when the data features are related among themsel-ves (51). Our method creates new feature values based on the biometric features values. Thus,in a sample with 400 feature values, most of them has similar values because it was generatedfrom 25 different values.

We suggest that the length of the key is equals to 3 times the number of biometricfeatures. As observed in Double Sum, the best result was when the key length is equals to 75


(a) (b)

(c) (d)

Figura 27 – DET curves for BioConvolving - Unknown Key Attack. (a) Heterogeneous -Scrolling Strokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous -Horizontal Strokes; (d) Homogeneous - Horizontal Strokes.

(3× 25) for both known and Unknown Key Attacks. For both Key Knowledge Attacks, anideal key length for Double Sum is 3 times the dimensionality of the biometric template.


One of the goals of this thesis is to verify if the performance of the cancelable methodsunder Known Key Attack improves when the key length is increased. We observed a continuousimprovement is observed when the key increases in Interpolation using Homogeneous Key andBioHashing in all Key Type scenarios experiments. In Interpolation, BioHashing and DoubleSum we observed some minor improvements but the most important it is the performancedoes not decrease when the key increases.


(a) (b)

(c) (d)

Figura 28 – DET curves for DoubleSum - Known Key Attack. (a) Heterogeneous - ScrollingStrokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous - HorizontalStrokes; (d) Homogeneous - Horizontal Strokes.

A relation with the key length and the Double Sum performance was not found inboth key knowledge attack. We observed some improvements in key lengths equals to 75 and100 but the performance decreases when the key length goes to 200 and 400. Therefore, wesuggest the use of the key length among 75 and 100.

Another hypothesis of this thesis is to observe if the performance of the biometricsystem increases in Unknown Key Attacks as well. We observed a clear improvement inBioHashing and Interpolation method in low FAR values. Unfortunately, the performancedecreases in BioConvolving and the performance of Double Sum does not increase when thelength of the key increases a lot.

In conclusion, the length of the key has direct impact in the performance of cancelablemethods. This performance is also dependent of the cancelable method and the key knowledge


(a) (b)

(c) (d)

Figura 29 – DET curves for DoubleSum - Unknown Key Attack. (a) Heterogeneous - ScrollingStrokes; (b) Homogeneous - Scrolling Strokes; (c) Heterogeneous - HorizontalStrokes; (d) Homogeneous - Horizontal Strokes.

attack. Therefore, we can argue that the length of the key has good (BioHashing), bad(BioConvolving) and no evident direct impact (Interpolation and Double Sum) in the evaluatedcancelable methods.

141

9 Conclusions

This thesis proposes and investigates the use of cancelable functions to protect biometricdata. In particular a behavioural modality. In addition, this thesis reports a study of the impactof the storage and length of cancelable keys in accuracy and security of biometric systems.Moreover, we show the implication of our findings in user authentication in smartphones.Our thesis is guided by experimentation. Thus, we use machine learning to verify users usingbiometric data protected by cancelable functions. Our experiments use single classifiers andensemble systems to verify enrolled users (genuine and impostor users) using a touch screendataset. Specifically, each machine learning structure aims to answer specific research goals:

• Comparison of the accuracy and security of authentication systems using unprotectedand protected data.

• The use of an authentication system composed by multi-algorithms (ensemble system )using a multi protected biometric modality. We aim with this solution to increase thesecurity and the accuracy of user authentication systems.

• Finally, a study of the impact of storage method and the length of the key in theaccuracy of authentication systems.

9.1 Use of Single and Ensemble System

Through this thesis, we have demonstrated that the use of transformation functionsusually provides similar or better performance than the Original dataset, except in BioHashingfunction. We carried out extensive experimentation in order to achieve the specified goals.The experiments compared the accuracy of the authentication system using unprotected andprotected biometric data using classifiers. We observed that it is possible to use protecteddata rather unprotected in authentication system that use biometric data.

The investigation of the use of ensemble system showed that, in a general perspective,these systems provided lower EER values than using the single classifiers. In the analysiswe observed that the accuracy of ensemble system was superior than single classifiers. Ourexperiments reinforces the literature, ensemble systems provide better performance than singleclassifiers. Hence, the low performance of BioHashing was improved using ensemble systems.

142 Capítulo 9. Conclusions

9.2 Multi-Privacy Protection Schemes

The use of multiple protected samples outperformed the previous results (singleclassifiers and ensemble system). The proposed method uses different versions of cancelabledata and ensemble system to verify enrolled users. We namely this method as Multi-PrivacyProtection Schemes (MPPS). The MPPS achieved better performance compared with singleprotection scheme. In addition, MPPS has better performance than single protection schemeeven using ensemble systems. In summary, the use of multiple versions of protected biometricdata increases the performance and security of biometric systems.

Unfortunately, Multi-Privacy Protection Scheme increases user inconvenience becausethe individual needs to present more than one key to encode his/her biometric sample duringuser verification. In addition, the processing time of MPPS is higher than single protectedversion because it is necessary to encode each biometric sample with all used cancelablefunctions. However, the benefits of the increase of performance and security can overcomethis limitation.

9.3 Attacks Based on Key Knowledge

This thesis proposes and evaluates four scenarios. Each scenario is different due to keyknowledge by impostor (Known and Unknown Key Attack) and the use of single (HomogeneousKey) or different keys (Heterogeneous Key) for the users. We compare the recognitionperformance of the biometric system suffering a standard template attack (Unknown KeyAttack) and a Known Key Attack, i.e, when the impostor knows the key using the proposedtwo storage solution (Homogeneous and Heterogeneous key).

A biometric system which uses cancelable templates preserves the user privacy. Inaddition, it provides lower False Acceptance and False Rejection Errors in Unknown Keyattacks. The EER values of Unknown Key attacks, using Homogeneous and HeterogeneousKey, presents lower values than baseline using Interpolation, BioHashing and Double Sumin both stroke direction. Additionally, the BioConvolving EER values under Unknown Keyattacks shows similar performance than unprotected samples. Using scrolling strokes, theBioConvolving presents a worst statistical result than baseline. However, the EER difference issmall. All the remained cases shows a better statistical performance using protected template.Therefore, it is possible to say that cancelable methods increase the security performance ofbiometric system than unprotected template system.

The performance of a SVM classifier under Known Key attacks has similar EERthan baseline EER, except for BioConvolving. The DET curves of the evaluated cancelable

9.4. Key Length 143

methods under Known Key attack are around baseline curves. Unfortunately, the statisticaltest showed that the matching scores of the cancelable methods and baseline comes fromdifferent continuous distribution. Thus, the evaluated cancelable methods presents a worststatistical result than baseline. However, based in DET curves, the difference between thecancelable performance and baseline is not so different. Thus, a SVM classifier classifies animpostor with almost the same baseline accuracy using a template protection under a KnownKey attack.

The user key availability does not affect the user privacy and security of the system.An impostor with the user key is not able to access the original biometric features becauseused cancelable functions are non-invertible. Thus, the protected templates are unbreakable infeasible time. In addition, even in possession of the user key, the performance of the biometricsystem using cancelable templates is comparable to the use of unprotected templates. Therefore,the user key can be available in the system because attacks using it does not affect the userprivacy and they do not decrease drastically the system security.

Based on these findings, we propose the use of a unique key for cancelable methodsto protect biometric templates. The experiments show that the use of a unique key slightlydecrease the security performance compared with the use of different keys and unprotectedbiometric samples. We showed that the homogeneous key has 10 out 16 better statisticalresults compared to unprotected samples. Therefore, the use of homogeneous key are betterin 62.5% of cases.

The use of homogeneous key is appropriate to Multi-Privacy Protection System (MPPS).The MPPS requires the use of a user specific key for each cancelable function. However,the performance observed in Homogeneous Key under known and Unknown Key attacksshows that is possible to use a single key for all users. As consequence, the Homogeneous Keyincreases the usability of MPPS. This solution avoids the user to provide a different key foreach cancelable function used in MPPS.

9.4 Key Length

The thesis analysis the impact of key length in the analysed cancelable methods.We also evaluate the effects of the length under Key Knowledge attacks. We showed theimprovements and deterioration caused in the recognition performance due to the increasingof key length.

A perceptible continuum improvement is observed when the key increases in Inter-polation and BioHashing method in both Key Knowledge attacks. BioHashing presents


considerable improvement in the performance when the key length increases. In contrast,Interpolation and Double Sum have minor improvements but the most important is theperformance does not decrease when the key increases.

Unfortunately, the key length and the performance are not directly related in BioCon-volving for Known and Unknown Key attacks. The BioConvolving decreases the performancewhen the key increases. The BioConvolving authors discussed previously that the number ofsegments (key length) negatively affects the recognition performance. Similarly, we observedthe same results but using a behavioural modality, which is more complex than physicalmodality.

A relation between the key length and the performance of Double Sum was not foundin both key knowledge attack. We observed some improvements in key lengths equals to 75and 100 but the performance has spoiled in higher values (key length=200 and 400).

One of possible explanation of this result is that in high key lengths the feature valuesare similar due to key information used in Double Sum procedure. The Double Sum is basedon the sum of two random feature values, selected by the key. A long key probability selectsthe same couple of features in a sample with low dimension.

We can argue that the length of the key has good (BioHashing), bad (BioConvolving)and no evident impact (Interpolation and Double Sum) in recognition performance under keyknowledge attacks. Thus, the length of the key affects in different ways the cancelable method.Therefore, we suggest that the authors of new cancelable functions report the consequences ofthe key length.

9.5 Implication of Our Findings

Based on our findings, we propose the use of a user key generated by system toauthenticate users in mobile devices using cancelable biometric samples. Smartphones cangenerate a cancelable template due its computation power and most of key generation is basedon random seed. All the disadvantages of the mobile generation key are related to randomvector generation issues.

We also propose to store a unique key inside the mobile device. The unique key can bestored without protection. It is not possible to decode a protected template using a genuinekey due to the non-invertible property of cancelable methods. Therefore, the design of asmartphone security system, using our solution, can generate and store the mobile owner keyin an unprotected way.

9.5. Implication of Our Findings 145

9.5.1 Recommended Biometric System Configuration

Smartphones represent to us the same that our daily objects such as a wallet or thekey of our house (82). Smartphones are considered multi-use gadget due to possibility to takepictures, to send messages and to access email or bank statements. Thus, it is important thatsmartphone developers implement security features to protect this user sensitive information.Therefore, smartphones must provide a security layer which allows only the owner to haveaccess to such sensitive information (54).

Smartphones present some solutions to authenticate the owner such as PIN, drawingpatterns, facial and fingerprint recognition (82). Unfortunately all the aforementioned solutionspresent some problems such as share the secret PIN or the drawing pattern. Other problemsrisen due to biometric recognition: environment changes affects the performance such as poorlight source in facial recognition or moisturised fingers during the fingerprint authentica-tion (52). Therefore, solutions which do not allow information sharing or physical/enviromentissues are desirable.

The analysis performed in this thesis provides only a single layer security. After theimpostor has been authenticated, he/she has total access to the user smartphone. In otherwords, the impostor can access the user private data in smartphones after the authentication.Therefore, solutions that can predict strange behaviour must be implemented, i.e., continuousauthentication.

A biometric system which use Homogeneous Key can be implemented in smartphones.As discussed before, biometric templates must be protected and cancelable functions providesinteresting characteristics to protect biometric data such as revocability and high complexityto decode a protected template. Sections 7.4 and 7.7 showed that the use of the HomogeneousKey under Known and Unknown Key attacks provides similar performance than HeterogeneousKey under the same attacks. This finding shows that is possible to design a biometric systemable to use a unique key for each cancelable function.

We debate how to manage the user key in mobile phones when cancelable functionsare used to protect biometric templates using homogeneous key. Thus, we try to understandand to provide some solutions related to key generation and key storage.

9.5.2 Key Management

In the BioHashing original study (39), the authors proposes the use of a token for keygeneration. In other words, the user must carry all the time a device which provides a key.The authors of the others used cancelable methods do not discuss the consequences of storageand generation of the user key. Therefore, this study expands the discussion about the design


of a key generation process in non-invertible cancelable methods. In addition, we discuss howto store a user key used by non-invertible cancelable methods.

The user key can be generated in the device. The performance of the cancelablemethods using homogeneous key is similar to the same methods using heterogeneous key(see Sections 7.7 and 7.8). Thus, the device can generate a unique key for each cancelablemethod. Hence, we propose the use of a system generated key. This key is generated in thedevice during the creation of the protected template, i.e, only once during the user templateenrolment.

The device must use the length of the key which gives better performance in bothknowledge key attacks. In the Chapter 8 we discussed what is the best length for the keys usedin the evaluated cancelable methods. We found that the use of standard or to increase the keylength affects the recognition performance. Thus, a smartphone security system must take inaccount the proposed lengths for each method in order to achieve the better performance.

In case of lost or stolen smartphone, the biometric templates are secure. The biometrictemplate protected by non-invertible functions cannot be decoded. Unfortunately, the personaldata such as pictures and contacts are present in the device. Thus, a smartphone securitysystem must protect the user identity and provides a method to wipe the user template, thekey and all the user data (94).

The user key can be stored in smartphones in an unprotected format due to non-invertible property of cancelable methods. As discussed, the attackers cannot decode aprotected template with an exposed key. Therefore, the devices must not to concern in howto protect the user key, because the user key does not have any impact in the user privacy.

In summary, smartphones can generate and store the required keys for each usedcancelable system. Consequently, the user do not carry a second device or to remembercomplex passwords in order to use the phone in a secure way. In addiction, in the case ofdevice compromise, the user generates a new key using the phone. Thus, a new protectedtemplate is created using new strokes and a new user generated key. Therefore, cancelabletemplates are recommended for secure authentication in smartphones devices.

9.6 Thesis LimitationIn this thesis we used only one dataset with few users and samples. Thus, our study is

based only from this dataset. Due to data characteristics, the results using another datasetcan be a slight different.

Our solution is based on single layer authentication method. Hence, continuous authen-

9.7. Future Work 147

tication must be taken in account due its advantages. The continuous authentication is anideal solution to mobile phone because it allows the user to use his phone without performan explicit authentication step, i.e., the system blocks the phone in the case of suspicioususe (25).

It is out of scope what counter-measures need to be performed in the case of deviceloss. We suggest that a remote wipe procedure must be done in order to not allow access touser private data and to avoid template replay attacks. We will discuss in the next sectionsome other alternatives to the founded limitations.

9.7 Future WorkOur work just modeled the solution based on protected and unprotected biometric

representation. Hence, we need to take into account user-specific parameters because userspecificities influence the biometric system performance. The use of user-specific parametersshowed promising solution (69).

The used classifiers and ensemble systems presents some limitations that reflects inthe experiments. Thus, this inherited limitation has a negative influence the recognitionperformance. A way to resolve such limitation is to use supporting classifier methods such asparameter tuning, meta learning or classifier selection.

Our results support the literature when it states that ensemble systems are morepowerful than single classifiers. As a future work, solutions like parameter optimizationcan improve the results achieved with BioHashing dataset. Another possible solution is toimplement alternative BioHashing procedure (8).

In Multi-Privacy Protection Schemes (MPPS) only voting decision fusion was used.Possible solutions are to use other ensemble architectures, such as boosting and bagging aswe did in Section 5.4. In addition, the analyse of the performance of MPPS using score-fusion,and user-specific characteristics should be tried.

Scrolling scenarios show the worst results compared to horizontal strokes. Thus, it isimportant to figure out what characteristics influences this outcome. Moreover, BioConvolvingpresented low EER values among the analysed cancelable function. Thus, new convolutionmethods and optimization of division points can be a promising solution to improve biometricsystem performance using BioConvolving.

As analysed, the homogeneous key scenario seems ideal for MPPS. We observedthat the MPPS has a superior performance because this approach increases the securityperformance compared with the use of a single cancelable method (18). As a future work, an


analyse of the performance of multi-privacy protection using a single key is an interestingcontribution.

Practical authentication systems must have low rates of false rejection and falseacceptance. We observed in our experiments that there is enough room to use protectedtemplate in broad used authentication system. Thus, the analyse of soft and multi-modalbehavioral biometrics in order to increase the performance of the biometric-based systemscan be developed.

Continuous authentication (25) minimize the problems related to standard methodsfor smartphone authentication such as PIN or fingerprint. Continuous authentication is amethod which monitor the user behaviour along device usage in order to verify if the actualuser is genuine. However, the continuous authentication is out of scope of this study. The userstrokes performed in the touch screen can be used as input for a continuous authenticationmethod. Thus, a smartphone security system can use a secure behavioural modality and toverify if a phone has been used by the genuine user all the time.

9.8 AcknowledgementsThis work was partially supported by CNPq - Conselho Nacional de Desenvolvimento

Científico e Tecnológico under the programme Science Without Borders. The author spent ayear in Surrey, England. During this time, the author developed the ideas and results reportedin Chapters 6, 7 and 8.

149

Referências

[1] Apple. About touch id security on iphone and ipad. digital, September 2015. URLhttps://support.apple.com/en-gb/HT204587. Visited in September/2015. Cited onpage 101.

[2] C. Apté and S. Weiss. Data mining with decision trees and decision rules. Futuregeneration computer systems, 13(2):197–210, 1997. Cited 2 times on pages 15 and 63.

[3] T. Boult. Robust distance measures for face-recognition supporting revocable biometricstoken. In 7th International Conference on Automatic Face and Gesture Recognition(FGR06), page 560 to 566, 2006. Cited on page 70.

[4] L. Breiman. Stacked regressions. Machine Learning, 24:49–64, 1996. Cited on page 69.

[5] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regressiontrees. CRC press, 1984. Cited on page 63.

[6] A. M. P. Canuto, M. Abreu, L. Oliveira, J. C. Xavier Junior, and A. Santos. Investigatingthe influence of the choice of the ensemble members in accuracy and diversity of selection-based and fusion-based methods for ensembles. Pattern Recognition Letters, 28(4):472–486, 2007. Cited on page 68.

[7] A. M. P. Canuto, F. Pintro, and J. C. Xavier Junior. Investigating fusion approachesin multi-biometric cancellable recognition. Expert Systems with Applications, 40(6):1971–1980, 2013. Cited 8 times on pages 45, 46, 47, 55, 70, 79, 101, and 107.

[8] A. M. P. Canuto, M. C. Fairhurst, F. Pintro, J. C. Xavier Junior, A. F. Neto, and L. M. G.Gonçalves. Classifier ensembles and optimization techniques to improve the performanceof cancellable fingerprint. International Journal of Hybrid Intelligent Systems, 8(3):143–154, 2011. Cited 2 times on pages 70 and 147.

[9] R. Cappelli, M. Ferrara, D. Maltoni, and F. Turroni. Fingerprint verification competitionat IJCB 2011. In 2011 International Joint Conference on Biometrics (IJCB), pages 1–6.IEEE, 2011. Cited 2 times on pages 34 and 42.

[10] B. Cestnik, I. Kononenko, and I. Bratko. ASSISTANT 86: A knowledge-elicitation toolfor sophisticated users. In EWSL, pages 31–45, 1987. Cited on page 63.

https://support.apple.com/en-gb/HT204587

150 Referências

[11] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACMTransactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011. Cited onpage 76.

[12] Y.-L. Chen, W.-C. Ku, Y.-C. Yeh, and D.-M. Liao. A simple text-based shouldersurfing resistant graphical password scheme. In 2013 IEEE International Symposium onNext-Generation Electronics (ISNE), pages 161–164. IEEE, 2013. Cited on page 71.

[13] G. Chetty and M. Wagner. Multi-level liveness verification for face-voice biometricauthentication. In 2006 Biometrics Symposium: Special Session on Research at theBiometric Consortium Conference, pages 1–6. IEEE, 2006. Cited on page 39.

[14] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines AndOther Kernel-based Learning Methods. Cambridge University Press, New York, NY, USA,2000. Cited on page 67.

[15] M. Damasceno and A. M. P. Canuto. An Empirical Analysis of Cancellable Transfor-mations in a Behavioral Biometric Modality. In 13th Conference on Hybrid IntelligentSystems, Tunis, Tunisia, December 2013. IEEE, IEEE. Cited 5 times on pages 31, 46,83, 101, and 107.

[16] M. Damasceno and A. M. P. Canuto. An empirical analysis of ensemble systems forrevocable behavioural biometric verification. Journal of Information Assurance & Security,9(4):186–196, 2014. Cited 6 times on pages 17, 31, 46, 90, 92, and 93.

[17] M. Damasceno and A. M. P. Canuto. An empirical analysis of ensemble systems incancellable behavioural biometrics: a touch screen dataset. In 2014 International JointConference on Neural Networks (IJCNN 2014), Beijing, China, July 2014. IEEE, IEEE.Cited 4 times on pages 31, 46, 55, and 94.

[18] M. Damasceno, A. M. P. Canuto, and N. Poh. Multi-privacy biometric protection schemeusing ensemble systems. In 2015 International Joint Conference on Neural Networks(IJCNN 2015), 2015. Cited 4 times on pages 31, 45, 46, and 147.

[19] T. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems,volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer, 2000. Citedon page 69.

[20] P. Domingos. A few useful things to know about machine learning. Communications ofthe ACM, 55(10):78–87, 2012. Cited on page 137.

Referências 151

[21] F. Esposito, D. Malerba, G. Semeraro, and J. Kay. A comparative analysis of methods forpruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence,19(5):476–491, 1997. Cited on page 64.

[22] K. Faceli, A. C. Lorena, J. Gama, and A. C. P. de Leon Ferreira de Carvalho. InteligênciaArtificial: Uma Abordagem de Aprendizado de Máquina. LTC, 2011. Cited 5 times onpages 62, 64, 65, 66, and 67.

[23] J. Feng and A. K. Jain. Fingerprint reconstruction: From minutiae to phase. IEEETransactions on Pattern Analysis and Machine Intelligence, 33(2):209–223, Feb 2011.Cited on page 89.

[24] Y. C. Feng, P. C. Yuen, and A. K. Jain. A hybrid approach for generating secure anddiscriminating face template. IEEE Transactions on Information Forensics and Security,5(1):103–117, 2010. Cited on page 44.

[25] M. Frank, R. Biedert, E. Ma, I. Martinovic, and D. Song. Touchalytics: On the applica-bility of touchscreen input as a behavioral biometric for continuous authentication. InIEEE Transactions on Information Forensics and Security, volume 8, pages 136–148,2013. Cited 8 times on pages 25, 71, 72, 73, 100, 101, 147, and 148.

[26] Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm, 1996. Citedon page 69.

[27] D. Gafurov, P. Bours, B. Yang, and C. Busch. Independent performance evaluationof pseudonymous identifier fingerprint verification algorithms. In Image Analysis andRecognition, pages 63–71. Springer, 2013. Cited on page 42.

[28] H. Gao, Z. Ren, X. Chang, X. Liu, and U. Aickelin. A new graphical password schemeresistant to shoulder-surfing. In 2010 International Conference on Cyberworlds (CW),pages 194–199. IEEE, 2010. Cited on page 71.

[29] S. Garcia-Salicetti, C. Beumier, G. Chollet, B. Dorizzi, J. L. les Jardins, J. Lunter,Y. Ni, and D. Petrovska-Delacrétaz. BIOMET: a multimodal person authenticationdatabase including face, voice, fingerprint, hand and signature modalities. In Audio andVideo-Based Biometric Person Authentication, pages 1056–1056. Springer, 2003. Citedon page 38.

[30] J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and techniques. Data Mining:Concepts and techniques, 3 edition, 2011. Cited on page 64.

152 Referências

[31] T. J. Hastie, R. J. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning:Data Mining, Inference, and Prediction. Springer, 2 edition, 2009. Cited on page 60.

[32] S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, UpperSaddle River, NJ, USA, 1st edition, 1994. Cited 2 times on pages 65 and 66.

[33] R. Herbrich. Learning Kernel Classifiers: Theory and Algorithms. MIT Press, Cambridge,MA, USA, 2001. Cited on page 68.

[34] C.-W. Hsu, C.-C. Chang, C.-J. Lin, et al. A practical guide to support vector classification,2003. Cited on page 76.

[35] W. Jackson. Antisec hackers claim theft of military e-mails from boozallen. Internet, July 2011. URL http://gcn.com/articles/2011/07/11/antisec-booz-allen-hack-military-emails.aspx. Accessed em November, 2011.Cited on page 25.

[36] A. K. Jain and A. Ross. Multibiometric systems. Communications of the ACM, 47(1):34–40, January 2004. Cited 3 times on pages 26, 38, and 89.

[37] A. K. Jain, A. Ross, and S. Prabhakar. An introduction to biometric recognition. IEEETransactions on Circuits and Systems for Video Technology, 14(1):4–20, January 2004.Cited 4 times on pages 15, 33, 35, and 36.

[38] A. K. Jain, K. Nandakumar, and A. Nagar. Biometric template security. EURASIPJournal on Advances in Signal Processing, 2008:113:1–113:17, January 2008. Cited 9times on pages 25, 26, 41, 42, 44, 45, 52, 89, and 99.

[39] A. T. B. Jin, D. N. C. Ling, and A. Goh. BioHashing: two factor authentication featuringfingerprint data and tokenised random number. Pattern Recognition, 37(11):2245 – 2255,2004. Cited 7 times on pages 28, 48, 51, 99, 101, 107, and 145.

[40] S. Kanade, D. Petrovska-Delacretaz, and B. Dorizzi. Cancelable iris biometrics and usingerror correcting codes to reduce variability in biometric data. In IEEE Conference onComputer Vision and Pattern Recognition, 2009, (CVPR 2009), pages 120 –127, 2009.Cited on page 70.

[41] E. J. C. Kelkboom, X. Zhou, J. Breebaart, R. N. J. Veldhuis, and C. Busch. Multi-algorithm fusion with template protection. In IEEE 3rd International Conference onBiometrics: Theory, Applications, and Systems, pages 1–8. IEEE, 2009. Cited 2 timeson pages 39 and 55.

http://gcn.com/articles/2011/07/11/antisec-booz-allen-hack-military-emails.aspx

http://gcn.com/articles/2011/07/11/antisec-booz-allen-hack-military-emails.aspx

Referências 153

[42] A. Kong, K.-H. Cheung, D. Zhang, M. Kamel, and J. You. An analysis of biohashingand its variants. Pattern Recognition, 39(7):1359–1368, 2006. Cited 4 times on pages 30,54, 115, and 127.

[43] L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. John Wiley &Sons, Inc, 2004. Cited 5 times on pages 59, 60, 62, 68, and 69.

[44] C. Lee and J. Kim. Cancelable fingerprint templates using minutiae-based bit-strings.Journal of Network and Computer Applications, 33(3):236 – 246, 2010. Cited 2 times onpages 26 and 70.

[45] P. Li, X. Yang, K. Cao, X. Tao, R. Wang, and J. Tian. An alignment-free fingerprint cryp-tosystem based on fuzzy vault scheme. Journal of Network and Computer Applications,33(3):207–220, 2010. Cited on page 43.

[46] A. Lumini and L. Nanni. An improved biohashing for human authentication. PatternRecognition, 40(3):1057 – 1065, 2007. Cited 12 times on pages 26, 28, 30, 47, 54, 102,107, 108, 126, 127, 130, and 134.

[47] E. Maiorana, M. Martinez-Diaz, P. Campisi, J. Ortega-Garcia, and A. Neri. Templateprotection for hmm-based on-line signature authentication. In IEEE Conference onComputer Vision and Pattern Recognition Workshops (CVPRW), pages 1–6, 2008. Cited8 times on pages 15, 30, 48, 49, 99, 101, 107, and 129.

[48] E. Maiorana, P. Campisi, and A. Neri. Template protection for dynamic time warpingbased biometric signature authentication. In Int conf on Digital Signal Processing,DSP’09, pages 526–531. IEEE Press, 2009. Cited 2 times on pages 70 and 125.

[49] E. Maiorana, P. Campisi, J. Fierrez, J. Ortega-Garcia, and A. Neri. Cancelable templatesfor sequence-based biometrics with application to on-line signature recognition. IEEETransactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 40(3):525–538, 2010. Cited 6 times on pages 28, 30, 53, 107, 127, and 135.

[50] E. Maiorana, P. Campisi, and A. Neri. Bioconvolving: Cancelable templates for amulti-biometrics signature recognition system. In 2011 IEEE International on SystemsConference (SysCon), pages 495–500. IEEE, 2011. Cited 2 times on pages 53 and 127.

[51] T. M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition,1997. Cited 3 times on pages 62, 65, and 137.

[52] A. C. Morris, S. Jassim, H. Sellahewa, L. Allano, J. Ehlers, D. Wu, J. Koreman, S. Garcia-Salicetti, B. Ly-Van, and B. Dorizzi. Multimodal person authentication on a smartphone

154 Referências

under realistic conditions. In Defense and Security Symposium, pages 6250–6262. Inter-national Society for Optics and Photonics, 2006. Cited 2 times on pages 39 and 145.

[53] R. Moskovitch, C. Feher, A. Messerman, N. Kirschnick, T. Mustafić, A. Camtepe,B. Lohlein, U. Heister, S. MHoller, L. Rokach, and Y. Elovici. Identity theft, computersand behavioral biometrics. In IEEE International Conference on Intelligence and SecurityInformatics, 2009. ISI’09., pages 155–160. IEEE, 2009. Cited on page 25.

[54] I. Muslukhov, Y. Boshmaf, C. Kuo, J. Lester, and K. Beznosov. Know your enemy:The risk of unauthorized access in smartphones by insiders. In Proceedings of the15th International Conference on Human-computer Interaction with Mobile Devices andServices, MobileHCI ’13, pages 271–280, New York, NY, USA, 2013. ACM. Cited onpage 145.

[55] A. Nagar, K. Nandakumar, and A. K. Jain. A hybrid biometric cryptosystem for securingfingerprint minutiae templates. Pattern Recognition Letters, 31:733–741, 2010. Cited 3times on pages 26, 70, and 100.

[56] A. Nagar, K. Nandakumar, and A. K. Jain. Biometric template transformation: Asecurity analysis. In Proceedings of the SPIE Media Forensics and Security, ElectronicImaging, pages 75410–75425, 2010. Cited 8 times on pages 28, 42, 101, 102, 105, 107,108, and 126.

[57] A. Nagar, K. Nandakumar, and A. K. Jain. Multibiometric cryptosystems based onfeature-level fusion. IEEE Transactions on Information Forensics and Security, 7(1):255–268, 2012. Cited on page 42.

[58] S. K. R. Nair, B. Bhanu, S. Ghosh, and N. S. Thakoor. Predictive models for mul-tibiometric systems. Pattern Recognition, 47(12):3779–3792, 2014. Cited on page38.

[59] K. Nandakumar. Multibiometric Systems: Fusion Strategies and Template Security. PhDthesis, Michigan State University, 2008. Cited on page 39.

[60] K. Nandakumar and A. K. Jain. Multibiometric template security using fuzzy vault. In2nd IEEE International Conference on Biometrics: Theory, Applications and Systems(BTAS 2008), pages 1–6. IEEE, 2008. Cited on page 43.

[61] K. Nandakumar and A. K. Jain. Biometric template protection: Bridging the performancegap between theory and practice. Signal Processing Magazine, 32(5):88–100, 2015. Cited2 times on pages 41 and 42.

Referências 155

[62] L. Nanni and A. Lumini. Empirical tests on biohashing. Neurocomputing, 69(16-18):2390–2395, 2006. Cited 3 times on pages 26, 101, and 107.

[63] L. Nanni, E. Maiorana, A. Lumini, and P. Campisi. Combining local, regional and globalmatchers for a template protected on-line signature verification system. Expert Systemswith Applications, 37(5):3676 – 3684, 2010. Cited on page 70.

[64] V. M. Patel, N. K. Ratha, and R. Chellappa. Cancelable biometrics: A review. SignalProcessing Magazine, 32(5):54–65, 2015. Cited 2 times on pages 26 and 42.

[65] P. P. Paul, M. Gavrilova, and S. Klimenko. Situation awareness of cancelable biometricsystem. The Visual Computer, 30(9):1059–1067, 2014. Cited 2 times on pages 101and 107.

[66] F. Pintro and A. Canuto. An experimental study of ensemble systems on cancellable iris.In 2012 Brazilian Symposium on Neural Networks, 2012. Cited 8 times on pages 49, 50,54, 99, 101, 107, 125, and 128.

[67] N. Poh, A. Merati, and J. Kittler. Heterogeneous information fusion: A novel fusionparadigm for biometric systems. In 2011 International Joint Conference on Biometrics(IJCB), pages 1–8, Oct 2011. Cited 2 times on pages 27 and 90.

[68] N. Poh and S. Bengio. How do correlation and variance of base-experts affect fusionin biometric authentication tasks? IEEE Transactions on Signal Processing, 53(11):4384–4396, 2005. Cited on page 89.

[69] N. Poh, A. Ross, W. Lee, and J. Kittler. A user-specific and selective multimodalbiometric fusion strategy by ranking subjects. Pattern Recognition, 46:3341–3357, 2013.Cited on page 147.

[70] J. R. Quinlan et al. Discovering rules by induction from large collections of examples.Expert systems in the micro electronic age. Edinburgh University Press, 1979. Cited onpage 63.

[71] J. R. Quinlan. C4.5: programs for machine learning, volume 1. Morgan kaufmann, 1993.Cited on page 63.

[72] S. Rane, Y. Wang, S. C. Draper, and P. Ishwar. Secure biometrics: Concepts, authen-tication architectures, and challengesation architectures, and challenges. IEEE SignalProcessing Magazine, 30(5):51–64, September 2013. Cited 5 times on pages 25, 26, 27,41, and 99.

156 Referências

[73] N. K. Ratha, S. Chikkerur, J. H. Connell, and R. M. Bolle. Generating cancelablefingerprint templates. IEEE Transactions on Pattern Analysis and Machine Intelligence,29(4):561–572, 2007. Cited 2 times on pages 26 and 99.

[74] C. Rathgeb and C. Busch. Multi-biometric template protection: Issues and challenges. InD. J. Yang, editor, New Trends and Developments in Biometrics. INTECH Open AccessPublisher, 2012. Cited 3 times on pages 27, 55, and 90.

[75] C. Rathgeb and A. Uhl. A survey on biometric cryptosystems and cancelable biometrics.EURASIP Journal on Information Security, 3(1):1–25, 2011. Cited 2 times on pages 41and 45.

[76] K. Revett. Behavioral Biometrics : a Remote Access Approach. John Wiley & Sons, Ltd,2008. Cited on page 25.

[77] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker Verification Using AdaptedGaussian Mixture Models. Digital Signal Processing, 10(1):19–41, 2000. Cited on page100.

[78] J. Richiardi and A. Drygajlo. Gaussian mixture models for on-line signature verification. InProceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications,pages 115–122. ACM, 2003. Cited on page 100.

[79] C. Roberts. Biometric attack vectors and defences. Computers & Security, 26(1):14 – 25,2007. Cited on page 27.

[80] A. Ross. Encyclopedia of Biometrics, chapter Multibiometrics, pages 1127–1133. SpringerUS, Boston, MA, 2015. Cited on page 38.

[81] S. R. Safavian and D. Landgrebe. A survey of decision tree classifier methodology. IEEETransactions on Systems, Man and Cybernetics, 21(3):660–674, 1991. Cited on page 64.

[82] E. Shi, Y. Niu, M. Jakobsson, and R. Chow. Implicit authentication through learninguser behavior. In Information Security, pages 99–113. Springer, 2011. Cited on page145.

[83] A. B. Teoh, D. C. Ngo, and A. Goh. Personalised cryptographic key generation basedon facehashing. Computers & Security, 23:606–614, 2004. Cited 2 times on pages 101and 107.

[84] A. B. Teoh, Y. W. Kuan, and S. Lee. Cancellable biometrics and annotations on BioHash.Pattern Recognition, 41(6):2034–2044, 2008. Cited on page 47.

Referências 157

[85] A. Tiwari, S. Sanyal, A. Abraham, S. J. Knapskog, and S. Sanyal. A Multi-Factor SecurityProtocol for Wireless Payment - Secure Web Authentication using Mobile Devices. InProceedings of the IADIS International Conference on Applied Computing, 2007. Citedon page 30.

[86] V. N. Vapnik. An overview of statistical learning theory. IEEE Transactions on NeuralNetworks, 10(5):988–999, 1999. Cited on page 59.

[87] L. Wang and X. Geng. Behavioral Biometrics for Human Identification: IntelligentApplications. IGI Global, 2010. Cited on page 26.

[88] I. H. Witten and E. Frank. Data Mining: Pratical Machine Learning Tools and Techniques.The Morgan Kaufmann Series in Data Management Systems. Elsevier, 2 edition, 2005.Cited 2 times on pages 65 and 66.

[89] D. H. Wolpert. Stacked generalization. Neural Networks, 5(2), 1992. Cited on page 69.

[90] L. Wu, X. Liu, S. Yuan, and P. Xiao. A novel key generation cryptosystem based on facefeatures. In 10th International Conference on Signal Processing (ICSP), pages 1675–1678.IEEE, 2010. Cited on page 43.

[91] X. Wu, N. Qi, K. Wang, and D. Zhang. A novel cryptosystem based on iris key generation.In Natural Computation, 2008. ICNC’08. Fourth International Conference on, volume 4,pages 53–56. IEEE, 2008. Cited on page 43.

[92] W. Xu and M. Cheng. Cancelable voiceprint template based on chaff-points-mixturemethod. In Computational Intelligence and Security, 2008. CIS ’08. InternationalConference on, volume 2, pages 263 –266, 2008. Cited on page 70.

[93] R. V. Yampolskiy and V. Govindaraju. Behavioural biometrics: a survey and classification.International Journal of Biometrics, 1(1):81–113, 2008. Cited 4 times on pages 25, 26,27, and 34.

[94] X. Yu, Z. Wang, K. Sun, W. T. Zhu, N. Gao, and J. Jing. Remotely wiping sensitivedata on stolen smartphones. In Proceedings of the 9th ACM Symposium on Information,Computer and Communications Security, pages 537–542. ACM, 2014. Cited on page146.

UserVeriﬁcationUsingBehaviouralCancelable Templates€¦ · uma base de dados com traços...

Documents

Transcript of UserVeriﬁcationUsingBehaviouralCancelable Templates€¦ · uma base de dados com traços...