(2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts
-
Upload
international-center-for-biometric-research -
Category
Lifestyle
-
view
386 -
download
2
description
Transcript of (2013) A Trade-off Between Number of Impressions and Number of Interaction Attempts
The 8th International Conference on Information Technology and Applications (ICITA 2013)
Abstract--The amount of time taken to enroll or collect data from a subject in a fingerprint recognition system is of paramount importance. Time taken directly affects cost. A trade-off between number of impressions collected and number of interaction attempts allowed to submit those impressions must be realized. In this experiment, data were collected using an optical fingerprint sensor. Each subject submitted six successful impressions with a maximum of 18 interaction attempts. The resulting images were analyzed using three methods: the number of interaction attempts per finger, quality differences from the first three impressions to the last three impressions, and finally matching performance from the first three impressions to the last three impressions. The right middle finger seemed to have the most issues collecting as it required the most interaction attempts. Analysis was performed to show no significant differences in image quality or matching performance. However, after further analysis, a steady improvement was noticed from Group A to Group B in both image quality and matching performance Index Terms-- Biometrics, image quality, impression, interaction, matching performance
I. INTRODUCTION There are many factors that impact the performance of a biometric system, from poor quality data including ridge-valley structure [1], skin conditions [2], human interaction with the sensor [3], and the associated metadata attached to biometric data [4]. Poor quality data, in this case fingerprint images, regardless of the source have a resulting impact on the performance of a biometric [5–8], and can impact the operations of the system. Test protocol designers are faced with a series of challenges when collecting data and minimizing error, regardless of the cause. In [9], the development of the Human Biometric Sensor Interaction model is discussed, which examined four fundamental issues – how do users interact with the biometric device, what errors do the users make and are there any commonalities within these different errors, and what
J. Hasselgren is with the Technology, Leadership, and Innovation
Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]).
S. Elliott is with the Technology, Leadership, and Innovation Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]).
J. Gue is a student in the Technology, Leadership, and Innovation Department of Purdue University, West Lafayette IN 47907 USA (telephone: 765-494-2311, e-mail: [email protected]).
ISBN: 978-0-9803267-5-8
level of training should one expect to give the subject (if any at all) to successfully use a biometric device. Test protocol designers can reference documents describing the best practices of designing a test protocol, (for example [10]). And while minimizing the error is paramount in a test, so too are the decisions relating to the number of test subjects and the time they spend in the test center. The number of test subjects is an important task in developing the test protocol. Mansfield and Wayman note that the ideal test would be to have as many volunteers as is practically possible, each making a single transaction. They provide an example whereby an evaluation may have 200 subjects each enrolling and making three genuine transactions, with two further revisits, providing 1200 genuine attempts [10]. Test crews, and the number of attempts vary, depending on the nature of the test as well as the allowable expense related to test subject recruitment and administration of the test. In their guidance, [10] state that the test population should be “as large as practically possible”. Test protocols in the literature vary on the number of samples collected. One study examined image quality and performance on a single fingerprint sensor. Fifty subjects participated, providing three samples of their index, middle, ring and little on both hands, resulting in 1200 images [11]. Another study examined the effects of scanner height on fingerprint capture, and collected fingerprints from 75 different subjects at four different heights, with five different attempts [12]. Another example, FVC 2000 collected 880 fingerprints in total, with 8 impressions each per finger [13]. Each of these studies examined very different topics within fingerprint performance, but each test protocol designer made the determination of the number of fingerprints to collect, and the number of attempts that the subject would complete.
II. MOTIVATION
In an operational setting, there is an inherent trade-off between the number of samples collected, the number of interaction attempts to collect the samples, and the cost of the collection. For example, should the test personnel keep trying to collect from an individual that has poor image quality in the hope that they will provide better image quality because they are either getting accustomed to the device and improve their presentation? Or, in this scenario, is it better to stop after the first three attempts because the time taken to acquire the images does not provide any additional value? The research questions are as follows: does the quality improve with experience or familiarity with the device? Does performance change across different groups, such as the first three successfully acquired samples, the last three, the top three image quality samples, and for reference, the bottom three? All of these questions are
A Trade-off Between Number of Impressions and Number of Interaction Attempts
Jacob A. Hasselgren, Stephen J. Elliott, and Jue Gue, Member, IEEE
The 8th International Conference on Information Technology and Applications (ICITA 2013)
applicable in determining the best enrollment policy and will impact the time that the subject is at the enrollment station.
III. METHODOLOGY For the purposes of this study, and subsequent analysis the
following definitions are used. A successfully acquired sample (SAS) is determined when the fingerprint sensor acquired a sample. In these experiments, the fingerprint sensor acquired the sample with a slight set image quality threshold, which required a minimum number of minutiae. The following fingers were collected from the subject: right index, right middle, left index and left middle. Fig. 1 visually shows the hands used during this collection.
Fig. 1. Representation of fingers used for collection
Six impressions that were determined to be SAS’s were taken
on each finger. Each SAS was given an impression number, which in this case would always be a value between one and six. When a subject attempted to present to the sensor, regardless of whether a SAS occurred, or whether the presentation was good or bad, it was considered the subject had committed an interaction attempt. The subject was allowed maximum of 18 interaction attempts. The sensor used was the Digital Persona U.are.U 4500 sensor, which is commercially available. The data used in these analyses were taken from an on-going aging study in the BSPA Labs at Purdue University. Four fingerprint sensors were used in the overall data collection, along with other modalities. This particular sensor was the last sensor used in this fingerprint station.
The test protocol and subsequent definitions is consistent with the human biometric sensor interaction model as outlined in [3]. The schematic of interaction attempts and impressions is shown below in Fig. 2. Fig. 2 is only an example of the difference between impression numbers and interaction attempt numbers. Group A could consist of attempts higher in the order. Group B can consist of attempts 7, 8 and 9 or even 7, 11, and 16.
Fig. 2. Schematic of interaction attempts and impressions
Four different groups were established throughout these
analyses. Group A consisted of the first three successfully acquired samples for a subject for each finger. Group B
consisted of the last three successfully acquired samples for each subject for each finger. Group C included the images that have the lowest quality scores while Group D consisted of the highest image quality scores. Not all groups were used in every analysis.
Four commercially available software packages were used. Neurotechnology Megamatcher v4.3 was used for matching performance while the Aware WSQ1000 quality tool was used for image quality analysis. Oxford Wave graphing software was used to plot and calculate the Equal Error Rates, and Minitab 14 was used to determine statistical measures and results.
IV. RESULTS The results of the experiment are divided into three sections.
Table 1 provides a description of each analysis.
Table 1. Framework Analysis Description Groupings
Number of interaction attempts
Differences in the number of interaction
attempts based on finger location
Groups A and B
Image Quality
Differences in image quality from the first
three SAS to the last three SAS
Group A vs. Group B
Differences in image quality
from the lowest three quality
scoring SAS to the highest three quality scoring
SAS
Group C vs. Group D
Matching Performance
Differences in matching
performance from the first
three SAS to the last three SAS
Group A vs. Group B
Differences in matching
performance from the lowest
three quality scoring SAS to
the highest three quality scoring
SAS
Group C vs. Group D
The test subject population consisted of 49 males, 53 females,
and four subjects who did not disclose their demographic information.
The 8th International Conference on Information Technology and Applications (ICITA 2013)
A. Number of interaction attempts
The results consist of those subjects that presented six successfully acquired samples in 18 or less interaction attempts. The results of the number of attempts are shown below for each finger collected (right index, left index, right middle, and left middle). There was no significant difference for interaction attempts between Group A and Group B for any given finger.
In an ideal data collection scenario, the impression numbers should match the interaction attempt numbers, as no additional attempts would have been necessary. Group A’s impression numbers were always one through three, but some subjects, particularly in the right middle finger, needed as many as 12 attempts just to submit three SAS.
The majority of individuals achieved their samples in six interaction attempts across all finger locations. However, there are some fingers, notably the right middle, where the distribution is more spread out. This is shown in Table 2.
Table 2. Variance of attempts for group per finger Finger Location Group Variance
LI A 1.0830 B 2.0962
LM A 1.0320 B 1.3039
RI A 1.3047 B 2.2292
RM A 2.0180 B 2.8247
The right middle (RM) and the right index (RI) have a greater
variance in Groups A and B than the other fingers. This difference in variance may be explained by the ordering in which the fingers were collected. For this collection, the fingers were collected in the following order: right index, right middle, left index and left middle. These higher values in variation for the right index and right middle fingers could be a result of the subject becoming comfortable with the sensor. Since the right index and right middle fingers are the first two fingers to present to the sensor, perhaps there is a habituation factor that is affecting the result of the number of interaction attempts and the variance. This could also simply be a case of hand dominance; however, this was not available for this paper. B. Image Quality
It is well understood that image quality impacts performance. In this section, we evaluate image quality across four groups – the groups A and B (first three SAS and last three SAS, respectively) and additionally groups C (top three image quality) and D (bottom three image quality). The images were processed using a commercial quality scoring algorithm, Aware WSQ1000 that provided an aggregate quality score from 0-100. The breakdown of these quality scores are as follows: good ranges from 85-100, adequate from 75-84, marginal from 60-74, and poor from 0-59. The distribution of image quality scores are shown in Fig. 3.
Fig. 3. Distribution of quality across groups A and B and
finger location.
Referring to Fig. 3, each finger’s mean quality is between70 and 76, or marginal quality.
Modality SubtypeGroup2
RMRILMLIDCDCDCDC
100
90
80
70
60
50
40
30
Qua
lity
Quality in Groups
Fig. 4. Distribution of quality across groups C and D and
finger location.
Fig. 4 shows the quality distribution for Groups C and D, the lowest three quality scoring SAS and the highest three quality scoring SAS, respectively.
Table 3. Basic quality statistics for groups per finger Finger Location
Group Mean Std. Dev.
Variance
LI A 71.604 9.397 88.313 B 72.911 9.545 91.115 C 68.785 9.442 89.149 D 75.729 8.182 66.946 LM A 73.327 9.992 99.843 B 74.871 9.322 86.907 C 70.459 10.059 101.176 D 77.739 7.758 60.180 RI A 72.139 10.253 105.131 B 73.289 9.327 86.984 C 69.014 10.104 102.082 D 76.415 7.951 63.213 RM A 74.683 9.296 86.418 B 75.237 9.042 81.767 C 71.720 9.501 90.276
The 8th International Conference on Information Technology and Applications (ICITA 2013)
D 78.200 7.550 56.997 The variances of Group A were larger than Group B in all but
the left index finger. The means of quality for Group A and Group B of each finger were compared in a one-way ANOVA statistical test. There was no significant difference between Group A and Group B for any given finger.
The means of quality for Group C and Group D of each finger were compared in a one-way ANOVA statistical test. There was a significant difference for all fingers (p<.001).
C. Performance
To observe the differences in matching performance, the SAS, in their respective groups, were enrolled into minutiae-based matching software, Megamatcher 4.3. The resulting equal error rates for these matching sequences are presented in Table 4.
Table 4. Group A (first three) vs. Group B (last three)
Finger Group A vs Group A
Group B vs Group B
Group A vs Group B
LI 0.0000 0.0000 0.0006 LM 0.3322 0.0000 0.1282 RI 0.0000 0.0000 0.0000
RM 0.0000 0.0000 0.0000 No improvements were noticed in performance for any
fingers except for the left middle finger. When examining the performance Group A of the left middle finger, an Equal Error Rate (EER) of 0.3322 was observed. Group B of the same finger was matched to itself and the performance improved to 0.0000. The third matching procedure was an interoperable match with Group A being matched to Group B. This also produced an improvement from Group A being matched to itself at an EER of 0.1282.
To also observe the effect quality has on performance, Groups C and D were also matched to themselves and the other. The matching rates of Group C and D (the top three image quality scores and the bottom three image quality scores, respectively) are shown below.
Table 5: Group C (top three) vs. Group D (bottom three)
Finger Group C vs Group C
Group D vs Group D
Group C vs Group D
LI 0.0000 0.0000 0.0000 LM 0.2816 0.0506 0.1766 RI 0.0000 0.0000 0.0000
RM 0.0000 0.0000 0.0000 The left middle finger was the only finger that produced an
EER more than 0.0000. When examining the performance Group C of the left middle finger, an EER of 0.2816 was observed. In the second matching run, Group D was matched to itself and the performance improved 0.0506. This points to the conclusion that quality does affect performance as the highest three scoring improved the EER by 0.2310. The third matching
run performed was an interoperable match as Group C was matched to Group D. This also produced an improvement from Group C being matched to itself at an EER of 0.1282. These results do point to the idea that quality does affect performance.
V. CONCLUSIONS AND RECOMMENDATIONS It should be noted that the distribution of SAS does differ
from finger to finger. Subsequent work would be to examine other sensors and draw conclusions from this. Furthermore, there is additional work being conducted by O’Connor on the development of a metric to determine whether the subject is stable in their presentation – that is, it answers the problem of whe ther to take additional metrics given the prior knowledge of the individual’s performance within a given dataset [14].
Further work can be leveraged which would also identify test administrator error and provide an error-checking methodology for test administrators in the number of interaction attempts and impressions that are conducted.
While controlled laboratory style testing may not be impacted by this preliminary work, these results will provide guidance to operational data collections by answering the initial motivation of the study. In this study, we can conclude that test personnel would not benefit from collecting the additional fingerprints (4, 5 and 6) from LI, RI and RM, but would benefit marginally from collecting the six images. Furthermore, the quality metric may provide an additional tool in answering this question. Recall that the LM had the lowest group of quality images. Upon further analysis, these impressions came from subjects 60, 77 and 88. Perhaps these poor image quality metrics were caused by poor placement or age. The subjects’ ages were 60, 66 and 23, respectively.
It also should be noted that overall, the right index required more attempts to submit all six SAS’. This is interesting as it is assumed that the right index could be the more controllable finger for those with right hand dominance and this needs additional research.
Additionally, this study will be furthered by observing these metrics over multiple visits to attempt to measure habituation. Recall that both quality and performance improved from the first three impressions collected to the last three. This improvement could be an effect of using the device multiple times and becoming comfortable with it. The study from which this data was pulled from is a multiple visit study. Data will be available to observe this effect over multiple visits as well as multiple uses per visit.
REFERENCES [1] T. P. Pang, J. Xirdong, and W. Y. Yao, “Fingerprint image quality
analysis,” in 2004 International Conference on Image Processing,2004. ICIP ’04., 2004, pp. 1253–1256.
[2] K. Ito, A. Morita, T. Aoki, T. Higuchi, H. Nakajima, and K. Kobayashi, “A fingerprint recognition algorithm using phase-based image matching for low-quality fingerprints,” in IEEE International Conference on Image Processing 2005, 2005, pp. 33–36.
[3] E. Kukula, S. Elliott, and V. Duffy, “The effects of human interaction on biometric system performance,” in First International Conference on Digital Human Modeling (ICDHM 2007), Held as Part of HCI International, 2007, pp. 904–914.
The 8th International Conference on Information Technology and Applications (ICITA 2013) [4] A. Hicklin and R. Khanna, “The role of data quality in biometric
systems,” White Paper. Mitretek Systems (February 2006), no. February, 2006.
[5] J. Fierrez-Aguilar, L. Munoz-Serrano, F. Alonso-Fernandez, and J. Ortega-Garcia, “On the effects of image quality degradation on minutiae- and ridge-based automatic fingerprint recognition,” in Proceedings 39th Annual 2005 International Carnahan Conference on Security Technology, 2005, pp. 79–82.
[6] S. K. Modi, S. J. Elliott, and H. Kim, “Statistical analysis of fingerprint sensor interoperability performance,” in 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems, 2009, pp. 1–6.
[7] C. Jin, H. Kim, X. Cui, E. Park, J. Kim, J. Hwang, and S. Elliott, “Comparative Assessment of Fingerprint Sample Quality Measures Based on Minutiae-Based Matching Performance,” in 2009 Second International Symposium on Electronic Commerce and Security, 2009, vol. 2, pp. 309–313.
[8] P. Grother and E. Tabassi, “Performance of biometric quality measures.,” IEEE transactions on pattern analysis and machine intelligence, vol. 29, no. 4, pp. 531–43, Apr. 2007.
[9] S. J. Elliott and E. P. Kukula, “A definitional framework for the human/biometric sensor interaction model,” in Biometric Technology for Human Identification VII, 2010, vol. 7667, no. 1, p. 76670H–8.
[10] A. J. Mansfield and J. L. Wayman, “Best Practices in Testing and Reporting Performance of Biometric Devices ver 2.01,” Teddington, 2002.
[11] M. R. Young and S. J. Elliott, “Image Quality and Performance Based on Henry Classification and Finger Location,” in 2007 IEEE Workshop on Automatic Identification Advanced Technologies, 2007, pp. 51–56.
[12] M. Theofanos, S. Orandi, R. Micheals, B. Stanton, and N. Zhang, “Effects of Scanner Height on Fingerprint Capture.” National Institute of Standards and Technology, Gaithersburg, p. 58, 2006.
[13] R. Cappelli, D. Maio, D. Maltoni, J. L. Wayman, and A. K. Jain, “Performance evaluation of fingerprint verification systems.,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 1, pp. 3–18, Jan. 2006.
[14] K.J. O'Connor, “Examination of stability in fingerprint recognition across force levels,” M.S. thesis, Dept. Tech., Lead., and Innov., Purdue Univ., West Lafayette, IN, 2013.