Robert S. Zack May 8, 2010 METHODS OF DERIVING BIOMETRIC ROC CURVES FROM THE k-NN CLASSIFIER.

Robert S. Zack May 8, 2010

METHODS OF DERIVING BIOMETRIC ROC CURVES FROM THE k-NN CLASSIFIER

Agenda

Introduction to ROC Curves Classification Multi-Class Issues and Solutions New Derivation Methods Weak and Strong System Training Use Cases Search for a Topic Publications Dissertation Status Questions

Introduction to ROC Curves

Used for binary decisions Signal detection – signal / no signal Medical diagnosis – disease / no disease Biometric authentication – you are the person you claim to be / you

are not In biometrics the ROC curve varies from FAR=1 & FRR=0 at one end

to FAR=0 & FRR=1 at other FAR = False Accept Rate – the rate an imposter is falsely accepted FRR = False Reject Rate – the rate the correct person is falsely

rejected ROC Charts are expressed in terms of percentages (0-100%) or

probabilities (0-1). These are used interchangeably.

Authentication Analogy

Supreme Court – nine judges Usual procedure – majority required to make decision Like 9NN needing majority to authenticate a user

ROC Curve – creates many potential procedures Need 9 votes to make decision (very conservative) Need 8, 7, 6 votes to make decision (conservative) Need 5 votes to make decision (majority) Need 4, 3, 2 votes to make decision (liberal) Need 1 or even 0 votes to make decision (very liberal)

Anatomy of a Biometric ROC Curve

Conservative is too restrictive.

Positive classification requires strong evidence.

Liberal is too open.

Requires weak evidence.

Parametric Procedures

Parametric techniques are well studied.

Data follows a normal or Gaussian distribution.

Vary a threshold to obtain the tradeoff between FAR/FRR.

Probability density functions can be calculated without estimation.

Parametric ROC Derivation

Classification

1. The k-NN classifier is well studied.2. Biometrics classification problems can have many

classes. 3. It is easier to work with a large or unknown

population if the data is converted from a multi-class to a two-class decision.

4. Cha Dichotomy Model.

K-NN Nonparametric Classifier

k-NN is nonparametric.

A vector-difference model is used to covert a many class problem into a two class, binary problem.

Uses Euclidean distance

k-NN Classification Procedure for k=5, Adapted from Pattern Classification, Duda, et al.

Cha Dichotomy Model

Simplifies complexity

Transforms a feature space into a distance vector space.

Uses distance measures.

Multi-class to two Class Transformation Process, Adapted from Yoon et al (2005)

m-kNN Method

Pure Rank Method. Evaluate the top 7

NN. Q is authenticated

if # within-class matches is >= decision threshold of 4NN.

Unweighted. All W’s are equal in weight.

wm-kNN Method

Rank method weighted by rank order.

Authenticate if W choices are > weighted match (m)

Score varies from 0 to =k(k+1)/2 or 5+4+3+2+1

For every m, FAR/FRR pair or ROC point.

If m=0, FAR=1, FAR=0 …All users accepted.

If m=15, FAR=small, FRR=large, few Q’s accepted.

m-kNN and wm-kNN ROC’s

LapFree – Weak Training

m-kNN and wm-kNN ROC’s

DeskFree – Weak Training

t-kNN Method

A distance threshold method.

A positive vote is within a distance threshold from the user’s sample.

Uses feature vector space distances only.

At 0, no distance vectors are authenticated. FAR=0, FRR=100%. At t=100, all distance vectors are authenticated. FAR=100, FRR=0.

t-kNN Method

DeskFree (left) and LapFree (right) Data

ht-kNN Method

Weighted vote based on distances to the kNN.

Hybrid of rank method and vector space distances.

For each test sample, the within-class weight (WCW) is calculated based on the distance vectors.

DeskFree (left) and LapFree (right) Data

New Nonparametric ROC Methods

1. Need m votes out of k for decision• Pure rank method

2. Need wm votes for decision, but some judges get more than one vote (weighted method)• Rank method weighted by rank order

3. A positive vote is within a distance threshold from the user’s sample• Uses feature vector space distances only

4. Weighted vote based on distances to the kNN• Hybrid of rank method and vector space distances

Weak & Strong Training

Weak Training• People used in testing not used in training

Independent sets of users for testing and training

Strong Training• People used in testing also used in training

Usually to augment the different training people

• But new difference-vectors used to authenticate• For example, users provide 8 samples – 5 for training

and 3 to match against for authentication

Weak & Strong Training

1 3 5 7 9 11 13 15 17 19 2190.00%

91.00%

92.00%

93.00%

94.00%

95.00%

96.00%

97.00%

98.00%

99.00%

100.00%

kNN Performance

DeskFree (WT) LapFree (WT)

DeskFree (ST18) DeskFree (ST36)

DeskFree (ST54)

Nearest Neighbor

Per

cent

Acc

urac

y

Use Cases

On-line test taking – Authentication Application Enroll students at the start of a class. Collect biometric

samples. Authenticate users are who they should be using off-

line batch processing. Corporate Compliance Training/Test Administration

Enroll employees at some point prior to the training or test administration. Collect biometric samples. Refresh them at designated intervals.

Authenticate users are who they should be.

Future Work

Real-time authentication. Accuracy Improvements. Error Cost Analysis. Measurement Error.

Initial Search for a Topic

Started program in Fall 2008. Entered DPS with an idea to research a topic in the

area of mobile computing. Quickly discarded the idea.

Continued to search for ideas by participating as a Customer for IT691/CS691Projects. Became exposed to Facial and Keystroke Biometrics.

Continued working with Keystroke Biometrics and eventually found a topic with the help of Dr. Tappert.

Idea Vetting

The first few presentations of the topic met with a lot of resistance. It took some time to develop the “so what”.

Every Research Seminar was recorded so that I could go back and listen to criticisms.

Participated as co-author to several papers on the subject. Some papers were peer-reviewed and submitted for publication.

Publications

[1] J. Abbazio, S. Perez, D. Silva, R. Tesoriero, F. Penna, and R. S. Zack, "Face Biometric Systems," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2009, pp. C1.1-C1.8.

[2] A. Amatya, J. Aliperti, T. Mariutto, A. Shah, M. Warren, R. S. Zack, and C. C. Tappert, "Keystroke Biometric Authentication System Experimentation," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2009, pp. C4.1-C4.8.

[3] A. C. Caicedo, K. Chan, D. A. Germosen, S. Indukuri, M. N. Malik, D. Tulasi, M. C. Wagner, R. S. Zack, and C. C. Tappert, "Keystroke Biometric: Data/Feature Experiments," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010.

[4] K. Doller, S. Chebiyam, S. Ranjan, E. Little-Tores, and R. S. Zack, "Keystroke Biometric System Test Taker Setup and Data Collection," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010.

[5] S. Janapala, S. Roy, J. John, L. Columbu, J. Carrozza, R. S. Zack, and C. C. Tappert, "Refactoring a Keystroke Biometric System," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010, pp. B1.1-B1.8.

[6] M. Lam, U. Patel, M. Schepp, T. Taylor, and R. S. Zack, "Keystroke Biometric: Data Capture Resolution Accuracy," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010.

[7] C. C. Tappert, S.-H. Cha, M. Villani, and R. S. Zack, "A Keystroke Biometric System for Long-Text Input," International Journal of Information Security and Privacy, Pending Publication, 2010.

[8] R. S. Zack, C. C. Tappert, S.-H. Cha, J. Aliperti, A. Amatya, T. Mariutto, A. Shah, and M. Warren, "Obtaining Biometric ROC Curves from a Non-Parametric Classifier in a Long-Text-Input Keystroke Authentication Study," vol. 268, Pace University, 2009.

Questions

Robert S. Zack May 8, 2010 METHODS OF DERIVING BIOMETRIC ROC CURVES FROM THE k-NN CLASSIFIER.

Documents

Transcript of Robert S. Zack May 8, 2010 METHODS OF DERIVING BIOMETRIC ROC CURVES FROM THE k-NN CLASSIFIER.