Major project report on
-
Upload
ayesha-mubeen -
Category
Engineering
-
view
654 -
download
0
Transcript of Major project report on
MAJOR PROJECT REPORT ON
AN EFFICIENT PARALLEL APPROACH FOR SCLERA VEIN
RECOGNITION
A dissertation work submitted in partial fulfilment of the requirement for the award of the degree of
BACHELOR OF TECHNOLOGY
IN
(ELECTRONICS AND COMMUNICATION ENGINEERING)
BYADLA KIRAMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
Under the guidance of
MS SYEDA SANA FATIMA
Assistant professor
DEPARTMENT OF ELECTRONIC amp COMMUNICATION ENGINEERING
SHADAN WOMENS COLLEGE OF ENGINEERING amp TECHNOLOGY
(Affiliated To Jawaharlal Nehru Technological University Hyderabad)
2011-2015
CERTIFICATE
This is to certify that the project report entitled ldquoAN EFFICIENT
PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo
being submitted by AKIRANMAYI ASRILATHA and MAYESHA
MUBEEN to Jawaharlal Nehru Technological University Hyderabad
for the award of the degree of Bachelor of Technology in Electronics and
Communication Engineering This is a record of bonafide work carried
out by them under my supervision and guidance
The matter contained in this report has not been submitted to any other university or
institute for the award of any degree or diploma
MS SYEDA SANA FATIMA MS SSUNEETA
INTERNAL GUIDE HEAD OF THE DEPARTMENT
EXTERNAL GUIDE
ACKNOWLEDGEMENT
This is a report giving details of our project work titled ldquoAN EFFICIENT
PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo
Though this an attempt has been made to present the description of all the
theoretical and practical aspects of our project to all the possible extent
We take this opportunity to express our sincere appreciation to professor
Ms SSuneeta Head of the Department and staff of Bachelor of
Technology for their invaluable suggestion and keen interest they have
shown in successful completion of this project
We express our deep gratitude to our guide Ms Syeda Sana Fatima
whose in valuable reference Suggestion and encouragement have
immensely helped in successful completion of the project This project
would add as an asset to our academic profile
It is with profound sense of gratitude that we acknowledge our project
guide Ms Syeda Sana Fatima for providing us with live specification and
her valuable suggestion which encourage me to complete their project
successfully
We are happy to express our gratitude to one and all who helped us in the fulfilment
of the project successfully
We are thankful to our principal Dr MAZHER SALEEM Shadan Womenrsquos
College of Engineering and Technology for encouraging us to do the projects
ADLA KIRANMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
DECLARATION
We hereby declare that the work which is being presented in this project
entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA
VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the
requirement for the award of the Degree of Bachelor Of Technology in
ldquoElectronics and Communication Engineeringrdquo is an authentic record of
our work under the supervision of Ms Syeda Sana Fatima Assistant
professor and Ms SSuneeta Head of the Department of Electronics and
Communication Engineering SHADAN WOMENS COLLEGE OF
ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru
Technology University Hyderabad
The matter embodied in this report has not been submitted for the award of
any other degree
ADLA KIRANMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
INDEX
ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphelliphelliphellipi
CHAPTER 1
INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
hellip1-16
11 GENERAL
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
121 PREPROCESSING
122 IMAGE ENHANCEMENT
123 IMAGE RESTORATION
124 IMAGE COMPRESSION
125 SEGMENTATION
126 IMAGE RESTORATION
127 FUNDAMENTAL STEPS
13 A SIMPLE IMAGE MODEL
14 IMAGE FILE FORMATS
15 TYPE OF IMAGES
151 BINARY IMAGES
152 GRAY SCALE IMAGE
153 COLOR IMAGE
154 INDEXED IMAGE
16 APPLICATIONS OF IMAGE PROCESSING
17 EXIXTING SYSTEM
171 DISADVANTAGES OF EXISTING SYSTEM
18 LITERATURE SURVEY
19 PROPOSED SYSTEM
191 ADVANTAGES
CHAPTER 2 PROJECT
DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46
21 INTRODUCTION
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
222 SCLERA SEGMENTATION
223 IRIS AND EYELID REFINEMENT
224 OCULAR SURFACE VASCULATURE
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
23 EVOLUTION OF GPU ARCHITECTURE
231 PROGRAMMING A GPU FOR GRAPHICS
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
233 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
25 MAPPING THE SUBTASKS TO CUDA
251 MAPPING ALGORITHM TO BLOCKS
252 MAPPING INSIDE BLOCK
253 MEMORY MANAGEMENT
26 HISTOGRAM OF ORIENTED GRADIENTS
CHAPTER 3 SOFTWARE
SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53
31 GENERAL
32 SOFTWARE REQUIREMENTS
33 INTRODUCTION
34 FEATURES OF MATLAB
341 INTERFACING WITH OTHER LANGUAGES
35 THE MATLAB SYSTEM
351 DESKTOP TOOLS
352 ANALYZING AND ACCESSING DATA
353 PERFORMING NUMERIC COMPUTATION
CHAPTER 4
IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphellip54-69
41 GENERAL
42 CODING IMPLEMENTATION
43 SNAPSHOTS
CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
70
CHAPTER 6 CONCLUSION amp FUTURE
SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72
61 CONCLUSION
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
CERTIFICATE
This is to certify that the project report entitled ldquoAN EFFICIENT
PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo
being submitted by AKIRANMAYI ASRILATHA and MAYESHA
MUBEEN to Jawaharlal Nehru Technological University Hyderabad
for the award of the degree of Bachelor of Technology in Electronics and
Communication Engineering This is a record of bonafide work carried
out by them under my supervision and guidance
The matter contained in this report has not been submitted to any other university or
institute for the award of any degree or diploma
MS SYEDA SANA FATIMA MS SSUNEETA
INTERNAL GUIDE HEAD OF THE DEPARTMENT
EXTERNAL GUIDE
ACKNOWLEDGEMENT
This is a report giving details of our project work titled ldquoAN EFFICIENT
PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo
Though this an attempt has been made to present the description of all the
theoretical and practical aspects of our project to all the possible extent
We take this opportunity to express our sincere appreciation to professor
Ms SSuneeta Head of the Department and staff of Bachelor of
Technology for their invaluable suggestion and keen interest they have
shown in successful completion of this project
We express our deep gratitude to our guide Ms Syeda Sana Fatima
whose in valuable reference Suggestion and encouragement have
immensely helped in successful completion of the project This project
would add as an asset to our academic profile
It is with profound sense of gratitude that we acknowledge our project
guide Ms Syeda Sana Fatima for providing us with live specification and
her valuable suggestion which encourage me to complete their project
successfully
We are happy to express our gratitude to one and all who helped us in the fulfilment
of the project successfully
We are thankful to our principal Dr MAZHER SALEEM Shadan Womenrsquos
College of Engineering and Technology for encouraging us to do the projects
ADLA KIRANMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
DECLARATION
We hereby declare that the work which is being presented in this project
entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA
VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the
requirement for the award of the Degree of Bachelor Of Technology in
ldquoElectronics and Communication Engineeringrdquo is an authentic record of
our work under the supervision of Ms Syeda Sana Fatima Assistant
professor and Ms SSuneeta Head of the Department of Electronics and
Communication Engineering SHADAN WOMENS COLLEGE OF
ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru
Technology University Hyderabad
The matter embodied in this report has not been submitted for the award of
any other degree
ADLA KIRANMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
INDEX
ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphelliphelliphellipi
CHAPTER 1
INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
hellip1-16
11 GENERAL
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
121 PREPROCESSING
122 IMAGE ENHANCEMENT
123 IMAGE RESTORATION
124 IMAGE COMPRESSION
125 SEGMENTATION
126 IMAGE RESTORATION
127 FUNDAMENTAL STEPS
13 A SIMPLE IMAGE MODEL
14 IMAGE FILE FORMATS
15 TYPE OF IMAGES
151 BINARY IMAGES
152 GRAY SCALE IMAGE
153 COLOR IMAGE
154 INDEXED IMAGE
16 APPLICATIONS OF IMAGE PROCESSING
17 EXIXTING SYSTEM
171 DISADVANTAGES OF EXISTING SYSTEM
18 LITERATURE SURVEY
19 PROPOSED SYSTEM
191 ADVANTAGES
CHAPTER 2 PROJECT
DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46
21 INTRODUCTION
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
222 SCLERA SEGMENTATION
223 IRIS AND EYELID REFINEMENT
224 OCULAR SURFACE VASCULATURE
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
23 EVOLUTION OF GPU ARCHITECTURE
231 PROGRAMMING A GPU FOR GRAPHICS
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
233 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
25 MAPPING THE SUBTASKS TO CUDA
251 MAPPING ALGORITHM TO BLOCKS
252 MAPPING INSIDE BLOCK
253 MEMORY MANAGEMENT
26 HISTOGRAM OF ORIENTED GRADIENTS
CHAPTER 3 SOFTWARE
SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53
31 GENERAL
32 SOFTWARE REQUIREMENTS
33 INTRODUCTION
34 FEATURES OF MATLAB
341 INTERFACING WITH OTHER LANGUAGES
35 THE MATLAB SYSTEM
351 DESKTOP TOOLS
352 ANALYZING AND ACCESSING DATA
353 PERFORMING NUMERIC COMPUTATION
CHAPTER 4
IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphellip54-69
41 GENERAL
42 CODING IMPLEMENTATION
43 SNAPSHOTS
CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
70
CHAPTER 6 CONCLUSION amp FUTURE
SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72
61 CONCLUSION
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
ACKNOWLEDGEMENT
This is a report giving details of our project work titled ldquoAN EFFICIENT
PARALLEL APPROACH FOR SCLERA VEIN RECOGNITIONrdquo
Though this an attempt has been made to present the description of all the
theoretical and practical aspects of our project to all the possible extent
We take this opportunity to express our sincere appreciation to professor
Ms SSuneeta Head of the Department and staff of Bachelor of
Technology for their invaluable suggestion and keen interest they have
shown in successful completion of this project
We express our deep gratitude to our guide Ms Syeda Sana Fatima
whose in valuable reference Suggestion and encouragement have
immensely helped in successful completion of the project This project
would add as an asset to our academic profile
It is with profound sense of gratitude that we acknowledge our project
guide Ms Syeda Sana Fatima for providing us with live specification and
her valuable suggestion which encourage me to complete their project
successfully
We are happy to express our gratitude to one and all who helped us in the fulfilment
of the project successfully
We are thankful to our principal Dr MAZHER SALEEM Shadan Womenrsquos
College of Engineering and Technology for encouraging us to do the projects
ADLA KIRANMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
DECLARATION
We hereby declare that the work which is being presented in this project
entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA
VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the
requirement for the award of the Degree of Bachelor Of Technology in
ldquoElectronics and Communication Engineeringrdquo is an authentic record of
our work under the supervision of Ms Syeda Sana Fatima Assistant
professor and Ms SSuneeta Head of the Department of Electronics and
Communication Engineering SHADAN WOMENS COLLEGE OF
ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru
Technology University Hyderabad
The matter embodied in this report has not been submitted for the award of
any other degree
ADLA KIRANMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
INDEX
ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphelliphelliphellipi
CHAPTER 1
INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
hellip1-16
11 GENERAL
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
121 PREPROCESSING
122 IMAGE ENHANCEMENT
123 IMAGE RESTORATION
124 IMAGE COMPRESSION
125 SEGMENTATION
126 IMAGE RESTORATION
127 FUNDAMENTAL STEPS
13 A SIMPLE IMAGE MODEL
14 IMAGE FILE FORMATS
15 TYPE OF IMAGES
151 BINARY IMAGES
152 GRAY SCALE IMAGE
153 COLOR IMAGE
154 INDEXED IMAGE
16 APPLICATIONS OF IMAGE PROCESSING
17 EXIXTING SYSTEM
171 DISADVANTAGES OF EXISTING SYSTEM
18 LITERATURE SURVEY
19 PROPOSED SYSTEM
191 ADVANTAGES
CHAPTER 2 PROJECT
DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46
21 INTRODUCTION
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
222 SCLERA SEGMENTATION
223 IRIS AND EYELID REFINEMENT
224 OCULAR SURFACE VASCULATURE
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
23 EVOLUTION OF GPU ARCHITECTURE
231 PROGRAMMING A GPU FOR GRAPHICS
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
233 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
25 MAPPING THE SUBTASKS TO CUDA
251 MAPPING ALGORITHM TO BLOCKS
252 MAPPING INSIDE BLOCK
253 MEMORY MANAGEMENT
26 HISTOGRAM OF ORIENTED GRADIENTS
CHAPTER 3 SOFTWARE
SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53
31 GENERAL
32 SOFTWARE REQUIREMENTS
33 INTRODUCTION
34 FEATURES OF MATLAB
341 INTERFACING WITH OTHER LANGUAGES
35 THE MATLAB SYSTEM
351 DESKTOP TOOLS
352 ANALYZING AND ACCESSING DATA
353 PERFORMING NUMERIC COMPUTATION
CHAPTER 4
IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphellip54-69
41 GENERAL
42 CODING IMPLEMENTATION
43 SNAPSHOTS
CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
70
CHAPTER 6 CONCLUSION amp FUTURE
SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72
61 CONCLUSION
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
DECLARATION
We hereby declare that the work which is being presented in this project
entitled ldquoAN EFFICIENT PARALLEL APPROACH FOR SCLERA
VEIN RECOGNITIONrdquo submitted towards the partial fulfilment of the
requirement for the award of the Degree of Bachelor Of Technology in
ldquoElectronics and Communication Engineeringrdquo is an authentic record of
our work under the supervision of Ms Syeda Sana Fatima Assistant
professor and Ms SSuneeta Head of the Department of Electronics and
Communication Engineering SHADAN WOMENS COLLEGE OF
ENGINEERING AND TECHNOLOGY affiliated to Jawaharlal Nehru
Technology University Hyderabad
The matter embodied in this report has not been submitted for the award of
any other degree
ADLA KIRANMAYI
ANNABATHULA SRILATHA
MAYESHA MUBEEN
INDEX
ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphelliphelliphellipi
CHAPTER 1
INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
hellip1-16
11 GENERAL
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
121 PREPROCESSING
122 IMAGE ENHANCEMENT
123 IMAGE RESTORATION
124 IMAGE COMPRESSION
125 SEGMENTATION
126 IMAGE RESTORATION
127 FUNDAMENTAL STEPS
13 A SIMPLE IMAGE MODEL
14 IMAGE FILE FORMATS
15 TYPE OF IMAGES
151 BINARY IMAGES
152 GRAY SCALE IMAGE
153 COLOR IMAGE
154 INDEXED IMAGE
16 APPLICATIONS OF IMAGE PROCESSING
17 EXIXTING SYSTEM
171 DISADVANTAGES OF EXISTING SYSTEM
18 LITERATURE SURVEY
19 PROPOSED SYSTEM
191 ADVANTAGES
CHAPTER 2 PROJECT
DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46
21 INTRODUCTION
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
222 SCLERA SEGMENTATION
223 IRIS AND EYELID REFINEMENT
224 OCULAR SURFACE VASCULATURE
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
23 EVOLUTION OF GPU ARCHITECTURE
231 PROGRAMMING A GPU FOR GRAPHICS
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
233 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
25 MAPPING THE SUBTASKS TO CUDA
251 MAPPING ALGORITHM TO BLOCKS
252 MAPPING INSIDE BLOCK
253 MEMORY MANAGEMENT
26 HISTOGRAM OF ORIENTED GRADIENTS
CHAPTER 3 SOFTWARE
SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53
31 GENERAL
32 SOFTWARE REQUIREMENTS
33 INTRODUCTION
34 FEATURES OF MATLAB
341 INTERFACING WITH OTHER LANGUAGES
35 THE MATLAB SYSTEM
351 DESKTOP TOOLS
352 ANALYZING AND ACCESSING DATA
353 PERFORMING NUMERIC COMPUTATION
CHAPTER 4
IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphellip54-69
41 GENERAL
42 CODING IMPLEMENTATION
43 SNAPSHOTS
CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
70
CHAPTER 6 CONCLUSION amp FUTURE
SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72
61 CONCLUSION
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
INDEX
ABSTRACThelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphelliphelliphellipi
CHAPTER 1
INTRODUCTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
hellip1-16
11 GENERAL
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
121 PREPROCESSING
122 IMAGE ENHANCEMENT
123 IMAGE RESTORATION
124 IMAGE COMPRESSION
125 SEGMENTATION
126 IMAGE RESTORATION
127 FUNDAMENTAL STEPS
13 A SIMPLE IMAGE MODEL
14 IMAGE FILE FORMATS
15 TYPE OF IMAGES
151 BINARY IMAGES
152 GRAY SCALE IMAGE
153 COLOR IMAGE
154 INDEXED IMAGE
16 APPLICATIONS OF IMAGE PROCESSING
17 EXIXTING SYSTEM
171 DISADVANTAGES OF EXISTING SYSTEM
18 LITERATURE SURVEY
19 PROPOSED SYSTEM
191 ADVANTAGES
CHAPTER 2 PROJECT
DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46
21 INTRODUCTION
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
222 SCLERA SEGMENTATION
223 IRIS AND EYELID REFINEMENT
224 OCULAR SURFACE VASCULATURE
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
23 EVOLUTION OF GPU ARCHITECTURE
231 PROGRAMMING A GPU FOR GRAPHICS
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
233 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
25 MAPPING THE SUBTASKS TO CUDA
251 MAPPING ALGORITHM TO BLOCKS
252 MAPPING INSIDE BLOCK
253 MEMORY MANAGEMENT
26 HISTOGRAM OF ORIENTED GRADIENTS
CHAPTER 3 SOFTWARE
SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53
31 GENERAL
32 SOFTWARE REQUIREMENTS
33 INTRODUCTION
34 FEATURES OF MATLAB
341 INTERFACING WITH OTHER LANGUAGES
35 THE MATLAB SYSTEM
351 DESKTOP TOOLS
352 ANALYZING AND ACCESSING DATA
353 PERFORMING NUMERIC COMPUTATION
CHAPTER 4
IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphellip54-69
41 GENERAL
42 CODING IMPLEMENTATION
43 SNAPSHOTS
CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
70
CHAPTER 6 CONCLUSION amp FUTURE
SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72
61 CONCLUSION
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
16 APPLICATIONS OF IMAGE PROCESSING
17 EXIXTING SYSTEM
171 DISADVANTAGES OF EXISTING SYSTEM
18 LITERATURE SURVEY
19 PROPOSED SYSTEM
191 ADVANTAGES
CHAPTER 2 PROJECT
DESCRIPTIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip17-46
21 INTRODUCTION
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
222 SCLERA SEGMENTATION
223 IRIS AND EYELID REFINEMENT
224 OCULAR SURFACE VASCULATURE
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
23 EVOLUTION OF GPU ARCHITECTURE
231 PROGRAMMING A GPU FOR GRAPHICS
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
233 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
25 MAPPING THE SUBTASKS TO CUDA
251 MAPPING ALGORITHM TO BLOCKS
252 MAPPING INSIDE BLOCK
253 MEMORY MANAGEMENT
26 HISTOGRAM OF ORIENTED GRADIENTS
CHAPTER 3 SOFTWARE
SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53
31 GENERAL
32 SOFTWARE REQUIREMENTS
33 INTRODUCTION
34 FEATURES OF MATLAB
341 INTERFACING WITH OTHER LANGUAGES
35 THE MATLAB SYSTEM
351 DESKTOP TOOLS
352 ANALYZING AND ACCESSING DATA
353 PERFORMING NUMERIC COMPUTATION
CHAPTER 4
IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphellip54-69
41 GENERAL
42 CODING IMPLEMENTATION
43 SNAPSHOTS
CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
70
CHAPTER 6 CONCLUSION amp FUTURE
SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72
61 CONCLUSION
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
251 MAPPING ALGORITHM TO BLOCKS
252 MAPPING INSIDE BLOCK
253 MEMORY MANAGEMENT
26 HISTOGRAM OF ORIENTED GRADIENTS
CHAPTER 3 SOFTWARE
SPECIFICATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip47-53
31 GENERAL
32 SOFTWARE REQUIREMENTS
33 INTRODUCTION
34 FEATURES OF MATLAB
341 INTERFACING WITH OTHER LANGUAGES
35 THE MATLAB SYSTEM
351 DESKTOP TOOLS
352 ANALYZING AND ACCESSING DATA
353 PERFORMING NUMERIC COMPUTATION
CHAPTER 4
IMPLEMENTATIONhelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
helliphellip54-69
41 GENERAL
42 CODING IMPLEMENTATION
43 SNAPSHOTS
CHAPTER 5 helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
70
CHAPTER 6 CONCLUSION amp FUTURE
SCOPEhelliphelliphelliphelliphelliphelliphelliphelliphelliphellip71-72
61 CONCLUSION
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
62 REFERENCES
APPLICATION
LIST OF FIGURES
FIG NO
FIG NAME PGNO
11 Fundamental blocks of digital image processing
2
12 Gray scale image 813 The additive model of RGB 914 The colors created by the
subtractive model of CMYK9
21 The diagram of a typical sclera vein recognition approach
19
22 Steps of Segmentation 2123 Glare area detection 2124 Detection of the sclera area 22
25 Pattern of veins 23
26 Sclera region and its vein
patterns 25
27 Filtering can take place
simultaneously on different
25
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
parts of the iris image
28 The sketch of parameters of
segment descriptor
26
29 The weighting image 28
210 The module of sclera
template matching
28
211 The Y shape vessel branch in
sclera
28
212 The rotation and scale
invariant character of Y
shape vessel branch
29
213 The line descriptor of the
sclera vessel pattern
30
214 The key elements of
descriptor vector
31
215 Simplified sclera matching
steps on GPU
32
216 Two-stage matching scheme 35
217 Example image from the
UBIRIS database
42
218 Occupancy on various thread
numbers per block
43
219 The task assignment inside
and outside the GPU
44
220 HOG features 46
41 Original sclera image 65
42 Binarised sclera image 65
43 Edge map subtracted image 66
44 Cropping roi 66
45 Roi mask 67
46 Roi finger sclera image 67
47 Enhanced sclera image 68
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
48 Feature extracted sclera
image
68
49 Matching with images in
database
69
410 Result 69
ABSTRACT
Sclera vein recognition is shown to be a promising method for human
identification However its matching speed is slow which could impact its
application for real-time applications To improve the matching efficiency
we proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching First we designed a
rotation- and scale-invariant Y shape descriptor based feature extraction
method to efficiently eliminate most unlikely matches Second we
developed a weighted polar line sclera descriptor structure to incorporate
mask information to reduce GPU memory cost Third we designed a
coarse-to-fine two-stage matching method Finally we developed a
mapping scheme to map the subtasks to GPU processing units The
experimental results show that our proposed method can achieve dramatic
processing speed improvement without compromising the recognition
accuracy
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
CHAPTER 1
INTRODUCTION
11GENERAL
Digital image processing is the use of computer algorithms to
perform image processing on digital images The 2D continuous image is
divided into N rows and M columns The intersection of a row and a
column is called a pixel The image can also be a function other variables
including depth color and time An image given in the form of a
transparency slide photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory This digitized image can then
be processed andor displayed on a high-resolution television monitor For
display the image is stored in a rapid-access buffer memory which
refreshes the monitor at a rate of 25 frames per second to produce a visually
continuous display
12 OVERVIEW ABOUT DIGITAL IMAGE PROCESSING
The field of ldquoDigital Image Processingrdquo refers to processing the digital
images by means of a digital computer In a broader sense it can be
considered as a processing of any two dimensional data where any image
(optical information) is represented as an array of real or complex numbers
represented by a definite number of bits An image is represented as a two
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
dimensional function f(xy) where lsquoxrsquo and lsquoyrsquo are spatial (plane)
coordinates and the amplitude of f at any pair of coordinates (xy)
represents the intensity or gray level of the image at that point
A digital image is one for which both the co-ordinates and the
amplitude values of f are all finite discrete quantities Hence a digital
image is composed of a finite number of elements each of which has a
particular location value These elements are called ldquopixelsrdquo A digital
image is discrete in both spatial coordinates and brightness and it can be
considered as a matrix whose rows and column indices identify a point on
the image and the corresponding matrix element value identifies the gray
level at that point
One of the first applications of digital images was in the newspaper
industry when pictures were first sent by submarine cable between London
and New York Introduction of the Bartlane cable picture transmission
system in the early 1920s reduced the time required to transport a picture
across the Atlantic from more than a week to less than three hours
FIG
121 PREPROCESSING
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
In imaging science image processing is any form of signal
processing for which the input is an image such as a photograph or video
frame the output of image processing may be either an image or a set of
characteristics or parameters related to the image Most image-processing
techniques involve treating the image as a two-dimensional signal and
applying standard signal-processing techniques to it Image processing
usually refers to digital image processing but optical and analog image
processing also are possible This article is about general techniques that
apply to all of them The acquisition of images (producing the input image
in the first place) is referred to as imaging
Image processing refers to processing of a 2D picture by a
computer Basic definitions
An image defined in the ldquoreal worldrdquo is considered to be a function
of two real variables for example a(xy) with a as the amplitude (eg
brightness) of the image at the real coordinate position (xy) Modern digital
technology has made it possible to manipulate multi-dimensional signals
with systems that range from simple digital circuits to advanced parallel
computers The goal of this manipulation can be divided into three
categories
Image processing (image in -gt image out)
Image Analysis (image in -gt measurements out)
Image Understanding (image in -gt high-level description out)
An image may be considered to contain sub-images sometimes referred
to as regions-of-interest ROIs or simply regions This concept reflects the
fact that images frequently contain collections of objects each of which can
be the basis for a region In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
regions Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve colour
rendition
Most usually image processing systems require that the images be
available in digitized form that is arrays of finite length binary words For
digitization the given Image is sampled on a discrete grid and each sample
or pixel is quantized using a finite number of bits The digitized image is
processed by a computer To display a digital image it is first converted
into analog signal which is scanned onto a display Closely related to
image processing are computer graphics and computer vision In computer
graphics images are manually made from physical models of objects
environments and lighting instead of being acquired (via imaging devices
such as cameras) from natural scenes as in most animated movies
Computer vision on the other hand is often considered high-level image
processing out of which a machinecomputersoftware intends to decipher
the physical contents of an image or a sequence of images (eg videos or
3D full-body magnetic resonance scans)
In modern sciences and technologies images also gain much
broader scopes due to the ever growing importance of scientific
visualization (of often large-scale complex scientificexperimental data)
Examples include microarray data in genetic research or real-time multi-
asset portfolio trading in finance Before going to processing an image it is
converted into a digital form Digitization includes sampling of image and
quantization of sampled values After converting the image into bit
information processing is performed This processing technique may be
Image enhancement Image restoration and Image compression
122 IMAGE ENHANCEMENT
It refers to accentuation or sharpening of image features such as
boundaries or contrast to make a graphic display more useful for display amp
analysis This process does not increase the inherent information content in
data It includes gray level amp contrast manipulation noise reduction edge
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
crispening and sharpening filtering interpolation and magnification
pseudo coloring and so on
123 IMAGE RESTORATION
It is concerned with filtering the observed image to minimize the
effect of degradations Effectiveness of image restoration depends on the
extent and accuracy of the knowledge of degradation process as well as on
filter design Image restoration differs from image enhancement in that the
latter is concerned with more extraction or accentuation of image features
124 IMAGE COMPRESSION
It is concerned with minimizing the number of bits required to represent
an image Application of compression are in broadcast TV remote sensing
via satellite military communication via aircraft radar teleconferencing
facsimile transmission for educational amp business documents medical
images that arise in Computer tomography magnetic resonance imaging
and digital radiology motion pictures satellite images weather maps
geological surveys and so on
Text compression ndash CCITT GROUP3 amp GROUP4
Still image compression ndash JPEG
Video image compression ndash MPEG
125 SEGMENTATION
In computer vision image segmentation is the process of
partitioning a digital image into multiple segments (sets of pixels also
known as super pixels) The goal of segmentation is to simplify andor
change the representation of an image into something that is more
meaningful and easier to analyze Image segmentation is typically used to
locate objects and boundaries (lines curves etc) in images More precisely
image segmentation is the process of assigning a label to every pixel in an
image such that pixels with the same label share certain visual
characteristics
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
The result of image segmentation is a set of segments that
collectively cover the entire image or a set of contours extracted from the
image (see edge detection) Each of the pixels in a region are similar with
respect to some characteristic or computed property such as
colour intensity or texture Adjacent regions are significantly different
with respect to the same characteristic(s) When applied to a stack of
images typical in medical imaging the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of
interpolation algorithms like marching cubes
126 IMAGE RESTORATION
Image restoration like enhancement improves the qualities of image
but all the operations are mainly based on known measured or
degradations of the original image Image restorations are used to restore
images with problems such as geometric distortion improper focus
repetitive noise and camera motion It is used to correct images for known
degradations
127 FUNDAMENTAL STEPS
Image acquisition to acquire a digital image
Image preprocessing to improve the image in ways that increases the
chances for success of the other processes
Image segmentation to partitions an input image into its constituent parts or
objects
Image representation to convert the input data to a form suitable for
computer processing
Image description to extract features that result in some quantitative
information of interest or features that are basic for differentiating one
class of objects from another
Image recognition to assign a label to an object based on the
information provided by its descriptors
Image interpretation to assign meaning to an ensemble of recognized
objects
Knowledge about a problem domain is coded into an image processing
system in the form of a Knowledge database
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
13 A SIMPLE IMAGE MODEL
To be suitable for computer processing an image f(xy) must be digitalized
both spatially and in amplitude
Digitization of the spatial coordinates (xy) is called image sampling
Amplitude digitization is called gray-level quantization
The storage and processing requirements increase rapidly with the spatial
resolution and the number of gray levels
Example A 256 gray-level image of size 256x256 occupies 64K bytes of
memory
Images of very low spatial resolution produce a checkerboard effect
The use of insufficient number of gray levels in smooth areas of a digital
image results in false contouring
14 IMAGE FILE FORMATS
There are two general groups of lsquoimagesrsquo vector graphics (or line art)
and bitmaps (pixel-based or lsquoimagesrsquo) Some of the most common file
formats are
GIF mdash Graphical interchange Format An 8-bit (256 colour) non-
destructively compressed bitmap format Mostly used for web Has several
sub-standards one of which is the animated GIF
JPEG mdash Joint Photographic Experts Group a very efficient (ie much
information per byte) destructively compressed 24 bit (16 million colours)
bitmap format Widely used especially for web and Internet (bandwidth-
limited)
TIFF mdash Tagged Image File Format The standard 24 bit publication bitmap
format Compresses non-destructively with for instance Lempel-Ziv-
Welch (LZW) compression
PS mdash Postscript a standard vector format Has numerous sub-standards
and can be difficult to transport across platforms and operating systems
PSD ndash Adobe PhotoShop Document a dedicated Photoshop format that
keeps all the information in an image including all the layers
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
BMP- bit map file format
15 TYPE OF IMAGES
Images are 4 types
1 Binary image
2 Gray scale image
3 Color image
4 Indexed image
151 BINARY IMAGES
A binary image is a digital image that has only two possible values for
each pixel Typically the two colors used for a binary image are black and
white though any two colors can be used Binary images are also called bi-
level or two-level This means that each pixel is stored as a single bitmdashie
a 0 or 1 The names black-and-white BampW
152 GRAY SCALE IMAGE
In a (8-bit) grayscale image each picture element has an assigned intensity
that ranges from 0 to 255 A grey scale image is what people normally call
a black and white image but the name emphasizes that such an image will
also include many shades of grey
FIG
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
153 COLOR IMAGE
The RGB colour model relates very closely to the way we perceive
colour with the r g and b receptors in our retinas RGB uses additive colour
mixing and is the basic colour model used in television or any other
medium that projects colour with light It is the basic colour model used in
computers and for web graphics but it cannot be used for print production
The secondary colours of RGB ndash cyan magenta and yellow ndash are formed
by mixing two of the primary colours (red green or blue) and excluding the
third colour Red and green combine to make yellow green and blue to
make cyan and blue and red form magenta The combination of red green
and blue in full intensity makes white
In Photoshop using the ldquoscreenrdquo mode for the different layers in an
image will make the intensities mix together according to the additive
colour mixing model This is analogous to stacking slide images on top of
each other and shining light through them
FIG
CMYK The 4-colour CMYK model used in printing lays down
overlapping layers of varying percentages of transparent cyan (C) magenta
(M) and yellow (Y) inks In addition a layer of black (K) ink can be added
The CMYK model uses the subtractive colour model
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
154 INDEXED IMAGE
FIG
An indexed image consists of an array and a color map matrix The
pixel values in the array are direct indices into a color map By convention
this documentation uses the variable name X to refer to the array and map
to refer to the color map In computing indexed color is a technique to
manage digital images colors in a limited fashion in order to save
computer memory and file storage while speeding up display refresh and
file transfers It is a form of vector quantization compression
When an image is encoded in this way color information is not
directly carried by the image pixel data but is stored in a separate piece of
data called a palette an array of color elements in which every element a
color is indexed by its position within the array The image pixels do not
contain the full specification of its color but only its index in the palette
This technique is sometimes referred as pseudocolor or indirect color as
colors are addressed indirectly
Perhaps the first device that supported palette colors was a random-
access frame buffer described in 1975 by Kajiya Sutherland and Cheadle
This supported a palette of 256 36-bit RGB colors
16 Applications of image processing
Interest in digital image processing methods stems from 2 principal
application areas
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
1) Improvement of pictorial information for human interpretation
2) Processing of scene data for autonomous machine perception
In the second application area interest focuses on procedures for
extracting from an image
Information in a form suitable for computer processing
Examples include automatic character recognition industrial machine
vision for product assembly and inspection military recognizance
automatic processing of fingerprints etc
17 EXISTING SYSTEM
In Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to perform a one-to-one matching In
Zhou et al Proposed line descriptor-based method for sclera vein
recognition The matching step (including registration) is the most time-
consuming step in this sclera vein recognition system which costs about 12
seconds to perform a one-to-one matching Both speed was calculated using
a PC with Intelreg Core TM 2 Duo 24GHz processors and 4 GB DRAM
Currently Sclera vein recognition algorithms are designed using central
processing unit (CPU)-based systems
171 DISADVANTAGES OF EXISTING SYSTEM
1 Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2 The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3 When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance
LITERATURE SURVEY
1 S Crihalmeanu and A Ross ldquo ieee Multispectral scleral patterns
for ocular biometric recognitionrdquo Pattern Recognit Lett vol 33 no
14 pp 1860ndash1869 Oct 2012
Face recognition in unconstrained acquisition conditions is one of the
most challenging problems that has been actively researched in recent
years It is well known that many state-of-the-arts still face recognition
algorithms perform well when constrained (frontal well illuminated high-
resolution sharp and full) face images are acquired However their
performance degrades significantly when the test images contain variations
that are not present in the training images In this paper we highlight some
of the key issues in remote face recognition We define the remote face
recognition as one where faces are several tens of meters (10-250m) from
the cameras We then describe a remote face database which has been
acquired in an unconstrained outdoor maritime environment Recognition
performance of a subset of existing still image-based face recognition
algorithms is evaluated on the remote face data set Further we define the
remote re-identification problem as matching a subject at one location with
candidate sets acquired at a different location and over time in remote
conditions We provide preliminary experimental results on remote re-
identification It is demonstrated that in addition to applying a good
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
classification algorithm finding features that are robust to variations
mentioned above and developing statistical models which can account for
these variations are very important for remote face recognition
2 R N Rakvic B J Ulis R P Broussard R W Ives and N
Steiner1 ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics
Security
With the rapidly expanded biometric data collected by various sectors
of government and industry for identification and verification purposes
how to manage and process such Big Data draws great concern Even
though modern processors are equipped with more cores and memory
capacity it still requires careful design in order to utilize the hardware
resource effectively and the power consumption efficiently This research
addresses this issue by investigating the workload characteristics of
biometric application Taking Daugmanrsquos iris matching algorithm which
has been proven to be the most reliable iris matching method as a case
study we conduct performance profiling and binary instrumentation on the
benchmark to capture its execution behavior The results show that data
loading and memory access incurs great performance overhead and
motivates us to move the biometrics computation to high-performance
architecture
Modern iris recognition algorithms can be computationally intensive
yet are designed for traditional sequential processing elements such as a
personal computer However a parallel processing alternative using field
programmable gate arrays (FPGAs) offers an opportunity to speed up iris
recognition Within the means of this project iris template generation with
directional filtering which is acomputationally expensive yet parallel
portion of a modern iris recognition algorithm is parallelized on an FPGA
system We will present a performance comparison of the parallelized
algorithm on the FPGA system to a traditional CPU-based version The
parallelized template generation outperforms an optimized C++ code
version determining the information content of an iris approximately 324
times faster
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
3 R Derakhshani A Ross and S Crihalmeanu ldquoA new biometric
modality based on conjunctival vasculaturerdquo in Proc Artif Neural
Netw Eng 2006 pp 1ndash8
A new biometric indicator based on the patterns of conjunctival
vasculature is proposed Conjunctival vessels can be observed on the visible
part of the sclera that is exposed to the outside world These vessels
demonstrate rich and specific details in visible light and can be easily
photographed using a regular digital camera In this paper we discuss
methods for conjunctival imaging preprocessing and feature extraction in
order to derive a suitable conjunctival vascular template for biometric
authentication Commensurate classification methods along with the
observed accuracy are discussed Experimental results suggest the potential
of using conjunctival vasculature as a biometric measure Identification of
a person based on some unique set of features is an important task The
human identification is possible with several biometric systems and sclera
recognition is one of the promising biometrics The sclera is the white
portion of the human eye The vein pattern seen in the sclera region is
unique to each person Thus the sclera vein pattern is a well suited
biometric technology for human identification The existing methods used
for sclera recognition have some drawbacks like only frontal looking
images are preferred for matching and rotation variance is another problem
These problems are completely eliminated in the proposed system by using
two feature extraction techniques They are Histogram of Oriented
Gradients (HOG) and converting the image into polar form using the
bilinear interpolation technique These two features help the proposed
system to become illumination invariant and rotation invariant The
experimentation is done with the help of UBIRIS database The
experimental result shows that the proposed sclera recognition method can
achieve better accuracy than the previous methods
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
4 J D Owens M Houston D Luebke S Green J E Stone and J
C Phillips ldquoGPU computingrdquo Proc IEEE vol 96 no 5pp 879ndash899
May 2008
The graphics processing unit (GPU) has become an integral part of
todayrsquos mainstream computing systems Over the past six years there has
been a marked increase in the performance and capabilities of GPUs The
modern GPU is not only a powerful graphics engine but also a highly
parallel programmable processor featuring peak arithmetic and memory
bandwidth that substantially outpaces its CPU counterpart The GPUrsquos
rapid increase in both programmability and capability has spawned a
research community that has successfully mapped a broad range of
computationally demanding complex problems to the GPU This effort in
general purpose computing on the GPU also known as GPU computing
has positioned the GPU as a compelling alternative to traditional
microprocessors in high-performance computer systems of the future We
describe the background hardware and programming model for GPU
computing summarize the state of the art in tools and techniques and
present four GPU computing successes in game physics and computational
biophysics that deliver order-of-magnitude performance gains over
optimized CPU applications
5 H Proenccedila and L A Alexandre ldquoUBIRIS A noisy iris image
databaserdquoin Proc 13th Int Conf Image Anal Process 2005 pp 970ndash
977
This paper proposes algorithms for iris segmentation quality
enhancement match score fusion and indexing to improve both the
accuracy and the speed of iris recognition A curve evolution approach is
proposed to effectively segment a nonideal iris image using the modified
MumfordndashShah functional Different enhancement algorithms are
concurrently applied on the segmented iris image to produce multiple
enhanced versions of the iris image A support-vector-machine-based
learning algorithm selects locally enhanced regions from each globally
enhanced image and combines these good-quality regions to create a single
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
high-quality iris image Two distinct features are extracted from the high-
quality iris image The global textural feature is extracted using the 1-D log
polar Gabor transform and the local topological feature is extracted using
Euler numbers An intelligent fusion algorithm combines the textural and
topological matching scores to further improve the iris recognition
performance and reduce the false rejection rate whereas an indexing
algorithm enables fast and accurate iris identification The verification and
identification performance of the propose algorithms is validated and
compared with other algorithms using the CASIA Version 3 ICE 2005 and
UBIRIS iris databases
18 PROPOSED METHOD
Proposed a new parallel sclera vein recognition method using a two-
stage parallel approach for registration and matching A parallel sclera
matching solution for Sclera vein recognition using our sequential line-
descriptor method using the CUDA GPU architecture CUDA is a highly
parallel multithreaded many-core processor with tremendous
computational power
It supports not only a traditional graphics pipeline but also computation
on non-graphical data It is relatively straightforward to implement our C
program for CUDA on AMD-based GPU using Open CL Our CUDA
kernels can be directly converted to Open CL kernels by concerning
different syntax for various keywords and built-in functions The mapping
strategy is also effective in Open CL if we regard thread and block in
CUDA as work item and work-group in Open CL Most of our optimization
techniques such as coalesced memory access and prefix sum can work in
Open CL too Moreover since CUDA is a data parallel architecture the
implementation of our approach by Open CL should be programmed in
data-parallel model
In this research we first discuss why the naiumlve parallel approach would
not work We then propose the new sclera descriptor ndash the Y shape sclera
feature-based efficient registration method to speed up the mapping scheme
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
introduce the ldquoweighted polar line (WPL) descriptorrdquo that would be better
suited for parallel computing to mitigate the mask size issue and develop
our coarse to fine two-stage matching process to dramatically improve the
matching speed These new approaches make the parallel processing
possible and efficient
191PROPOSED SYSTEM ADVANTAGES
1 To improve the efficiency in this research we propose a new descriptor
mdash the Y shape descriptor which can greatly help improve the efficiency of
the coarse registration of two images and can be used to filter out some
non-matching pairs before refined matching
2 We propose the coarse-to-fine two-stage matching process In the first
stage we matched two images coarsely using the Y-shape descriptors
which is very fast to match because no registration was needed The
matching result in this stage can help filter out image pairs with low
similarities
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
CHAPTER 2
PROJECT DESCRIPTION
21 INTRODUCTION
The sclera is the opaque and white outer layer of the eye The blood
vessel structure of sclera is formed randomly and is unique to each person
which can be used for humanrsquos identification Several researchers have
designed different Sclera vein recognition methods and have shown that it
is promising to use Sclera vein recognition for human identification In
Crihalmeanu and Ross proposed three approaches Speed Up Robust
Features (SURF)-based method minutiae detection and direct correlation
matching for feature registration and matching Within these three methods
the SURF method achieves the best accuracy It takes an average of 15
seconds1 using the SURF method to per- form a one-to-one matching Zhou
et al proposed line descriptor-based method for sclera vein recognition
The matching step (including registration) is the most time-consuming step
in this sclera vein recognition system which costs about 12 seconds to
perform a one-to-one matching Both speed was calculated using a PC with
Intelreg Coretrade 2 Duo 24GHz processors and 4 GB DRAM Currently
Sclera vein recognition algorithms are designed using central processing
unit (CPU)-based systems
As discussed CPU-based systems are designed as sequential
processing devices which may not be efficient in data processing where the
data can be parallelized Because of large time consumption in the matching
step Sclera vein recognition using sequential-based method would be very
challenging to be implemented in a real time biometric system especially
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
when there is large number of templates in the database for matching GPUs
(as abbreviation of General purpose Graphics Processing Units GPGPUs)
are now popularly used for parallel computing to improve the
computational processing speed and efficiency The highly parallel
structure of GPUs makes them more effective than CPUs for data
processing where processing can be performed in parallel GPUs have been
widely used in biometrics recognition such as speech recognition text
detection handwriting recognition and face recognition In iris
recognition GPU was used to extract the features construct descriptors
and match templates
GPUs are also used for object retrieval and image search Park et al
designed the performance evaluation of image processing algorithms such
as linear feature extraction and multi-view stereo matching on GPUs
However these approaches were designed for their specific biometric
recognition applications and feature searching methods Therefore they may
not be efficient for Sclera vein recognition Compute Unified Device
Architecture (CUDA) the computing engine of NVIDIA GPUs is used in
this research CUDA is a highly parallel multithreaded many-core
processor with tremendous computational power It supports not only a
traditional graphics pipeline but also computation on non-graphical data
More importantly it offers an easier programming platform which
outperforms its CPU counterparts in terms of peak arithmetic intensity and
memory bandwidth In this research the goal is not to develop a unified
strategy to parallelize all sclera matching methods because each method is
quite different from one another and would need customized design To
develop an efficient parallel computing scheme it would need different
strategies for different Sclera vein recognition methods
Rather the goal is to develop a parallel sclera matching solution for
Sclera vein recognition using our sequential line-descriptor method using
the CUDA GPU architecture However the parallelization strategies
developed in this research can be applied to design parallel approaches for
other Sclera vein recognition methods and help parallelize general pattern
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
recognition methods Based on the matching approach in there are three
challenges to map the task of sclera feature matching to GPU
1) Mask files are used to calculate valid overlapping areas of two sclera
templates and to align the templates to the same coordinate system But the
mask files are large in size and will preoccupy the GPU memory and slow
down the data transfer Also some of processing on the mask files will
involve convolution which is difficult to improve its performance on the
scalar process unit on CUDA
2) The procedure of sclera feature matching consists of a pipeline of several
computational stages with different memory and processing requirements
There is no uniform mapping scheme applicable for all these stages
3) When the scale of sclera database is far larger than the number of the
processing units on the GPU parallel matching on the GPU is still unable to
satisfy the requirement of real-time performance New designs are
necessary to help narrow down the search range In summary naiumlve
implementation of the algorithms in parallel would not work efficiently
Note it is relatively straightforward to implement our C program for
CUDA on AMD-based GPU using Open CL Our CUDA kernels can be
directly converted to Open CL kernels by concerning different syntax for
various keywords and built-in functions The mapping strategy is also
effective in Open CL if we regard thread and block in CUDA as work item
and work-group in Open CL Most of our optimization techniques such as
coalesced memory access and prefix sum can work in Open CL too
Moreover since CUDA is a data parallel architecture the implementation
of our approach by Open CL should be programmed in data-parallel model
In this research we first discuss why the naiumlve parallel approach would not
work (Section 3) We then propose the new sclera descriptor ndash the Y shape
sclera feature-based efficient registration method to speed up the mapping
scheme (Section 4) introduce the ldquoweighted polar line (WPL) descriptorrdquo
that would be better suited for parallel computing to mitigate the mask size
issue (Section 5) and develop our coarse to fine two-stage matching
process to dramatically improve the matching speed (Section 6) These new
approaches make the parallel processing possible and efficient However it
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
is non-trivial to implement these algorithms in CUDA We then developed
the implementation schemes to map our algorithms into CUDA (Section 7)
In the Section 2 we give brief introduction of Sclera vein recognition In
the Section 8 we performed some experiments using the proposed system
In the Section 9 we draw some conclusions
22 BACKGROUND OF SCLERA VEIN RECOGNITION
221 OVERVIEW OF SCLERA VEIN RECOGNITION
A typical sclera vein recognition system includes sclera
segmentation feature enhancement feature extraction and feature
matching (Figure 1)
FIG
Sclera image segmentation is the first step in sclera vein recognition
Several methods have been designed for sclera segmentation Crihalmeanu
et al presented an semi-automated system for sclera segmentation They
used a clustering algorithm to classify the color eye images into three
clusters - sclera iris and background Later on Crihalmeanu and Ross
designed a segmentation approach based on a normalized sclera index
measure which includes coarse sclera segmentation pupil region
segmentation and fine sclera segmentation Zhou et al developed a skin
tone plus ldquowhite colorrdquo-based voting method for sclera segmentation in
color images and Otsursquos thresholding-based method for grayscale images
After sclera segmentation it is necessary to enhance and extract the sclera
features since the sclera vein patterns often lack contrast and are hard to
detect Zhou et al used a bank of multi-directional Gabor filters for
vascular pattern enhancement Derakhshani et al used contrast limited
adaptive histogram equalization (CLAHE) to enhance the green color plane
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
of the RGB image and a multi-scale region growing approach to identify
the sclera veins from the image background Crihalmeanu and Ross applied
a selective enhancement filter for blood vessels to extract features from the
green component in a color image In the feature matching step
Crihalmeanu and Ross proposed
three registration and matching approaches including Speed Up Robust
Features (SURF) which is based on interest-point detection minutiae
detection which is based on minutiae points on the vasculature structure
and direct correlation matching which relies on image registration Zhou et
al designed a line descriptor based feature registration and matching
method
The proposed sclera recognition consists of five steps which include
sclera segmentation vein pattern enhancement feature extraction feature
matching and matching decision Fig 2 shows the block diagram of sclera
recognition Two types of feature extraction are used in the proposed
method to achieve good accuracy for the identification The characteristics
that are elicited from the blood vessel structure seen in the sclera region are
Histogram of Oriented Gradients (HOG) and interpolation of Cartesian to
Polar conversion HOG is used to determine the gradient orientation and
edge orientations of vein pattern in the sclera region of an eye image To
become more computationally efficient the data of the image are converted
to the polar form It is mainly used for circular or quasi circular shape of
object These two characteristics are extracted from all the images in the
database and compared with the features of the query image whether the
person is correctly identified or not This procedure is done in the feature
matching step and ultimately makes the matching decision By using the
proposed feature extraction methods and matching techniques the human
identification is more accurate than the existing studies In the proposed
method two features of an image are drawn out
222 SCLERA SEGMENTATION
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
Sclera segmentation is the first step in the sclera recognition It lets
in three steps glare area detection sclera area estimation and iris and eyelid
detection and refinement Fig shows the steps of segmentation
FIG
Glare Area Detection Glare area means a small bright area near
pupil or iris This is the unwanted portion on the eye image Sobel filter is
applied to detect the glare area present in the iris or pupil Simply it runs
only for the grayscale image If the image is color then it needs a
conversion to grayscale image and after that apply it to the Sobel filter to
detect the glare area Fig 4 shows the result of the glare area detection
FIG
Sclera area estimation For the estimation of sclera area Otsursquos
thresholding method is applied The stairs of the sclera area detection are
selection of the area of interest (ROI) Otsursquos thresholding sclera area
detection Left and right sclera area is selected based on the iris boundaries
When the region of interest is selected then apply Otsursquos thresholding for
obtaining the potential sclera areas The correct left sclera area should be
placed in the right and center positions and correct right sclera area should
be placed in the left and center In this way non sclera areas are wiped out
223 IRIS AND EYELID REFINEMENT
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
The top and underside of the sclera regions are the limits of the
sclera area And then that upper eyelid lower eyelid and iris boundaries are
refined These altogether are the unwanted portion for recognition In order
to eliminate these effects refinement is done in the footstep of the detection
of sclera area Fig shows after the Otsursquos thresholding process and iris and
eyelid refinement to detect right sclera area In the same way the left sclera
area is detected using this method
FIG
In the segmentation process all images are not perfectly segmented
Hence feature extraction and matching are needed to reduce the
segmentation fault The vein patterns in the sclera area are not visible in the
segmentation process To get vein patterns more visible vein pattern
enhancement is to be performed
224 OCULAR SURFACE VASCULATURE
Human recognition using vascular patterns in the human body has
been studied in the context of fingers (Miura et al 2004) palm (Lin and
Fan 2004) and retina (Hill 1999) In the case of retinal biometrics an
especial optical device for imaging the back of the eyeball is needed (Hill
1999) Due to its perceived invasiveness and the required degree of subject
cooperation the use of retinal biometrics may not be acceptable to some
individuals The conjunctiva is a thin transparent and moist tissue that
covers the outer surface of the eye The part of the conjunctiva that covers
the inner lining of the eyelids is called palpebral conjunctiva and the part
that covers the outer surface of the eye is called ocular (or the bulbar)
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
conjunctiva which is the focus of this study The ocular conjunctiva is very
thin and clear thus the vasculature (including those of the episclera) is
easily visible through it The visible microcirculation of conjunctiva offers a
rich and complex network of veins and fine microcirculation (Fig 1) The
apparent complexity and specificity of these vascular patterns motivated us
to utilize them for personal identification (Derakhshani and Ross 2006)
FIG
We have found conjunctival vasculature to be a suitable biometric as it
conforms to the following criteria (Jain et al 2004)
UNIVERSALITY All normal living tissues including that of the
conjunctiva and episclera have vascular structure
UNIQUENESS Vasculature is created during embryonic vasculogenesis
Its detailed final structure is mostly stochastic and thus unique Even
though no comprehensive study on the uniqueness of vascular structures
has been conducted study of some targeted areas such as those of the eye
fundus confirm the uniqueness of such vascular patterns even between
identical twins (Simon and Goldstein 1935 Tower 1955)
PERMANENCE Other than cases of significant trauma pathology or
chemical intervention spontaneous adult ocular vasculogenesis and
angiogenesis does not easily occur Thus the conjunctival vascular
structure is expected to have reasonable permanence (Joussen 2001)
Practicality Conjunctival vasculature can be captured with commercial off
the shelf digital cameras under normal lighting conditions making this
modality highly practical
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
ACCEPTABILITY Since the subject is not required to stare directly into
the camera lens and given the possibility of capturing the conjunctival
vasculature from several feet away this modality is non-intrusive and thus
more acceptable
SPOOF-PROOFNESS The fine multi surface structure of the ocular
veins makes them hard to reproduce as a physical artifact Besides being a
stand-alone biometric modality we anticipate that the addition of
conjunctival biometrics will enhance the performance of current iris-based
biometric system in the following ways
Improving accuracy by the addition of vascular features
Facilitating recognition using off-angle iris images For instance if the iris
information is relegated to the left or right portions of the eye the sclera
vein patterns will be further exposed This feature makes sclera vasculature
a natural complement to the iris biometric
Addressing the failure-to-enroll issue when iris patterns are not usable (eg
due to surgical procedures)
Reducing vulnerability to spoof attacks For instance when implemented
alongside iris systems an attacker needs to reproduce not only the iris but
also different surfaces of the sclera along with the associated
microcirculation and make them available on commensurate eye surfaces
The first step in parallelizing an algorithm is to determine the
availability for simultaneous computation Below Figure demonstrates the
possibility for parallel directional filtering Since the filter is computed over
different portions of the input image the computation can be computed in
parallel (denoted by Elements below) In addition individual parallelization
of each element of Filtering can also be performed A detailed discussion of
our proposed parallelization is outside the scope of this paper
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
FIG
225 OVERVIEW OF THE LINE DESCRIPTOR-BASED SCLERA
VEIN
2251 RECOGNITION METHOD
The matching segment of the line-descriptor based method is a
bottleneck with regard to matching speed In this section we briefly
describe the Line Descriptor-based sclera vein recognition method After
segmentation vein patterns were enhanced by a bank of directional Gabor
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
filters Binary morphological operations are used to thin the detected vein
structure down to a single pixel wide skeleton and remove the branch
points The line descriptor is used to describe the segments in the vein
structure Figure 2 shows a visual description of the line descriptor Each
segment is described by three quantities the segments angle to some
reference angle at the iris center θ the segments distance to the iris center r
and the dominant angular orientation of the line segment ɸ Thus the
descriptor is S = ( θ r ɸ )T The individual components of the line descriptor
are calculated as
FIG
Here fline (x) is the polynomial approximation of the line segment (xl yl )
is the center point of the line segment (xi yi ) is the center of the detected
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
iris and S is the line descriptor In order to register the segments of the
vascular patterns a RANSAC-based algorithm is used to estimate the best-
fit parameters for registration between the two sclera vascular patterns For
the registration algorithm it randomly chooses two points ndash one from the
test template and one from the target template It also randomly chooses a
scaling factor and a rotation value based on a priori knowledge of the
database Using these values it calculates a fitness value for the registration
using these parameters
After sclera template registration each line segment in the test
template is compared to the line segments in the target template for
matches In order to reduce the effect of segmentation errors we created the
weighting image (Figure 3) from the sclera mask by setting interior pixels
in the sclera mask to 1 pixels within some distance of the boundary of the
mask to 05 and pixels outside the mask to 0
The matching score for two segment descriptors is calculated By
where Si and Sj are two segment descriptors m(Si Sj ) is the matching
score between segments Si and Sj d(Si Sj ) is the Euclidean distance
between the segment descriptors center points (from Eq 6-8) Dmatch is
the matching distance threshold and match is the matching angle threshold
The total matching score M is the sum of the individual matching scores
divided by the maximum matching score for the minimal set between the
test and target template That is one of the test or target templates has fewer
points and thus the sum of its descriptors weight sets the maximum score
that can be attained
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
FIG
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
FIG
movement of eye Y shape branches are observed to be a stable feature and
can be used as sclera feature descriptor To detect the Y shape branches in
the original template we search for the nearest neighbors set of every line
segment in a regular distance classified the angles among these neighbors
If there were two types of angle values in the line segment set this set may
be inferred as a Y shape structure and the line segment angles would be
recorded as a new feature of the sclera
There are two ways to measure both orientation and relationship of
every branch of Y shape vessels one is to use the angles of every branch to
x axle the other is to use the angels between branch and iris radial
direction The first method needs additional rotation operating to align the
template In our approach we employed the second method As Figure 6
shows ϕ1 ϕ2 and ϕ3 denote the angle between each branch and the radius
from pupil center Even when the head tilts the eye moves or the camera
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
zooms occurs at the image acquisition step ϕ1 ϕ2 and ϕ3 are quite stable
To tolerate errors from the pupil center calculation in the segmentation step
we also recorded the center position (x y) of the Y shape branches as
auxiliary parameters So in our rotation shift and scale invariant feature
vector is defined as y(ϕ1 ϕ2 ϕ3 x y) The Y-shape descriptor is generated
with reference to the iris center Therefore it is automatically aligned to the
iris centers It is a rotational- and scale- invariant descriptor V WPL
SCLERA DESCRIPTOR As we discussed in the Section 22 the line
descriptor is extracted from the skeleton of vessel structure in binary images
(Figure 7) The skeleton is then broken into smaller segments For each
segment a line descriptor is created to record the center and orientation of
the segment This descriptor is expressed as s(x yɸ) where (x y) is the
position of the center and ɸ is its orientation Because of the limitation of
segmentation accuracy the descriptor in the boundary of sclera area might
not be accurate and may contain spur edges resulting from the iris eyelid
andor eyelashes To be tolerant of such error the mask file
FIG
The line descriptor of the sclera vessel pattern (a) An eye image (b) Vessel
patterns in sclera (c) Enhanced sclera vessel patterns (d) Centers of line
segments of vessel patterns
Is designed to indicate whether a line segment belongs to the edge of the
sclera or not However in GPU application using the mask is a challenging
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
since the mask files are large in size and will occupy the GPU memory and
slow down the data transfer When matching the registration RANSAC
type algorithm was used to randomly select the corresponding descriptors
and the transform parameter between them was used to generate the
template transform affine matrix After every templates transform the mask
data should also be transformed and new boundary should be calculated to
evaluate the weight of the transformed descriptor This results in too many
convolutions in processor unit
To reduce heavy data transfer and computation we designed the
weighted polar line (WPL) descriptor structure which includes the
information of mask and can be automatically aligned We extracted the
relationship of geometric feature of descriptors and store them as a new
descriptor We use a weighted image created via setting various weight
values according to their positions The weight of those descriptors who are
beyond the sclera are set to be 0 and those who are near the sclera
boundary are 05 and interior descriptors are set to be 1 In our work
descriptors weights were calculated on their own mask by the CPU only
once
The calculating result was saved as a component of descriptor The
descriptor of sclera will change to s(x y ɸw) where w denotes the weight
of the point and the value may be 0 05 1 To align two templates when a
template is shifted to another location along the line connecting their
centers all the descriptors of that template will be transformed It would be
faster if two templates have similar reference points If we use the center of
the iris as the reference point when two templates are compared the
correspondence will automatically be aligned to each other since they have
the similar reference point Every feature vector of the template is a set of
line segment descriptors composed of three variable (Figure 8) the
segment angle to the reference line which went through the iris center
denoted as θ the distance between the segments center and pupil center
which is denoted as r the dominant angular orientation of the segment
denoted as ɸ To minimize the GPU computing we also convert the
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
descriptor value from polar coordinate to rectangular coordinate in CPU
preprocess
The descriptor vector becomes s(x y r θ ɸw) The left and right
parts of sclera in an eye may have different registration parameters For
example as an eyeball moves left left part sclera patterns of the eye may be
compressed while the right part sclera patterns are stretched
In parallel matching these two parts are assigned to threads in
different warps to allow different deformation The multiprocessor in
CUDA manages threads in groups of 32 parallel threads called warps We
reorganized the descriptor from same sides and saved
FIG
FIG
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
them in continuous address This would meet requirement of coalesced
memory access in GPU
After reorganizing the structure of descriptors and adding mask information
into the new descriptor the computation on the mask file is not needed on
the GPU It was very fast to match with this feature because it does not
need to reregister the templates every time after shifting Thus the cost of
data transfer and computation on GPU will be reduced Matching on the
new descriptor the shift parameters generator in Figure 4 is then simplified
as Figure 9
23 EVOLUTION OF GPU ARCHITECTURE
The fixed-function pipeline lacked the generality to efficiently express
more complicated shading and lighting operations that are essential for
complex effects The key step was replacing the fixed-function per-vertex
and per-fragment operations with user-specified programs run on each
vertex and fragment Over the past six years these vertex programs and
fragment programs have become increasingly more capable with larger
limits on their size and resource consumption with more fully featured
instruction sets and with more flexible control-flow operations After many
years of separate instruction sets for vertex and fragment operations current
GPUs support the unified Shader Model 40 on both vertex and fragment
shaders
The hardware must support shader programs of at least 65 k static
instructions and unlimited dynamic instructions
The instruction set for the first time supports both 32-bit integers and 32-
bit floating-point numbers
The hardware must allow an arbitrary number of both direct and indirect
reads from global memory (texture)
Finally dynamic flow control in the form of loops and branches must be
supported
As the shader model has evolved and become more powerful and GPU
applications of all types have increased vertex and fragment program
complexity GPU architectures have increasingly focused on the
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
programmable parts of the graphics pipeline Indeed while previous
generations of GPUs could best be described as additions of
programmability to a fixed-function pipeline todayrsquos GPUs are better
characterized as a programmable engine surrounded by supporting fixed-
function units General-Purpose Computing on the GPU Mapping general-
purpose computation onto the GPU uses the graphics hardware in much the
same way as any standard graphics application Because of this similarity it
is both easier and more difficult to explain the process On one hand the
actual operations are the same and are easy to follow on the other hand the
terminology is different between graphics and general-purpose use Harris
provides an excellent description of this mapping process
We begin by describing GPU programming using graphics terminology
then show how the same steps are used in a general-purpose way to author
GPGPU applications and finally use the same steps to show the more
simple and direct way that todayrsquos GPU computing applications are written
231 PROGRAMMING A GPU FOR GRAPHICS
We begin with the same GPU pipeline that we described in Section II
concentrating on the programmable aspects of this pipeline
The programmer specifies geometry that covers a region on the screen
The rasterizer generates a fragment at each pixel location covered by that
geometry
Each fragment is shaded by the fragment program
The fragment program computes the value of the fragment by a
combination of math operations and global memory reads from a global
Btexture[ memory
The resulting image can then be used as texture on future passes through
the graphics pipeline
232 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (OLD)
Coopting this pipeline to perform general-purpose computation
involves the exact same steps but different terminology A motivating
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
example is a fluid simulation computed over a grid at each time step we
compute the next state of the fluid for each grid point from the current state
at its grid point and at the grid points of its neighbors
The programmer specifies a geometric primitive that covers a
computation domain of interest The rasterizer generates a fragment at each
pixel location covered by that geometry (In our example our primitive
must cover a grid of fragments equal to the domain size of our fluid
simulation)
Each fragment is shaded by an SPMD general-purpose fragment
program (Each grid point runs the same program to update the state of its
fluid)
The fragment program computes the value of the fragment by a
combination of math operations and Bgather[ accesses from global
memory (Each grid point can access the state of its neighbors from the
previous time step in computing its current value)
The resulting buffer in global memory can then be used as an input on
future passes (The current state of the fluid will be used on the next time
step)
23 2 PROGRAMMING A GPU FOR GENERAL-PURPOSE
PROGRAMS (NEW)
One of the historical difficulties in programming GPGPU applications
has been that despite their general-purpose tasksrsquo having nothing to do with
graphics the applications still had to be programmed using graphics APIs
In addition the program had to be structured in terms of the graphics
pipeline with the programmable units only accessible as an intermediate
step in that pipeline when the programmer would almost certainly prefer to
access the programmable units directly The programming environments we
describe in detail in Section IV are solving this difficulty by providing a
more natural direct non-graphics interface to the hardware and
specifically the programmable units Today GPU computing applications
are structured in the following way
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
The programmer directly defines the computation domain of interest as a
structured grid of threads
An SPMD general-purpose program computes the value of each thread
The value for each thread is computed by a combination of math
operations and both Bgather[ (read) accesses from and Bscatter[ (write)
accesses to global memory Unlike in the previous two
methods the same buffer can be used for both reading and writing
allowing more flexible algorithms (for example in-place algorithms that
use less memory)
The resulting buffer in global memory can then be used as an input in
future computation
24 COARSE-TO-FINE TWO-STAGE MATCHING PROCESS
To further improve the matching process we propose the coarse-to-fine
two-stage matching process In the first stage we matched two images
coarsely using the Y-shape descriptors which is very fast to match because
no registration was needed The matching result in this stage can help filter
out image pairs with low similarities After this step it is still possible for
some false positive matches In the second stage we used WPL descriptor
to register the two images for more detailed descriptor matching including
scale- and translation invariance This stage includes shift transform affine
matrix generation and final WPL descriptor matching Overall we
partitioned the registration and matching processing into four kernels2 in
CUDA (Figure 10) matching on the Y shape descriptor shift
transformation affine matrix generation and final WSL descriptor
matching Combining these two stages the matching program can run faster
and
achieve more accurate score
241 STAGE I MATCHING WITH Y SHAPE DESCRIPTOR
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
Due to scale- and rotation- invariance of the Y-shape features
registration is unnecessary before matching on Y shape descriptor The
whole matching algorithm is listed as algorithm 1
FIG
Here ytei and yta j are the Y shape descriptors of test template Tte
and target template Tta respectively dϕ is the Euclidian distance of angle
element of descriptors vector defined as (3) dxy is the Euclidian distance of
two descriptor centers defined as (4) ni and di are the matched descriptor
pairsrsquo number and their centers distance respectively tϕ is a distance
threshold and txy is the threshold to restrict the searching area We set tϕ to
30 and txy to 675 in our experiment Here
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
To match two sclera templates we searched the areas nearby to all
the Y shape branches The search area is limited to the corresponding left or
right half of the sclera in order to reduce the searching range and time The
distance of two branches is defined in (3) where ϕi j is the angle between
the j th branch and the polar from pupil center in desctiptor i
The number of matched pairs ni and the distance between Y shape
branches centers di are stored as the matching result We fuse the number of
matched branches and the average distance between matched branches
centers as (2) Here α is a factor to fuse the matching score which was set
to 30 in our study Ni and Nj is the total numbers of feature vectors in
template i and j separately The decision is regulated by the threshold t if
the sclerarsquos matching score is lower than t the sclera will be discarded The
sclera with high matching score will be passed to the next more precisely
matching process
242 STAGE II FINE MATCHING USING WPL DESCRIPTOR
The line segment WSL descriptor reveals more vessel structure detail of
sclera than the Y shape descriptor The variation of sclera vessel pattern is
nonlinear because
When acquiring an eye image in different gaze angle the vessel structure
will appear nonlinear shrink or extend because eyeball is spherical in shape
sclera is made up of four layers episclera stroma lamina fusca and
endothelium There are slightly differences among movement of these
layers Considering these factors our registration employed both single
shift transform and multi-parameter transform which combines shift
rotation and scale together
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
1) SHIFT PARAMETER SEARCH As we discussed before
segmentation may not be accurate As a result the detected iris center could
not be very accurate Shift transform is designed to tolerant possible errors
in pupil center detection in the segmentation step If there is no deformation
or only very minor deformation registration with shift transform together
would be adequate to achieve an accurate result We designed Algorithm 2
to get optimized shift parameter Where Tte is the test template and ssei is
the i th WPL descriptor of Tte Tta is the target template and ssai is the i th
WPL descriptor of Ttad(stek staj ) is Euclidean distance of escriptors stek
and staj
Δsk is the shift value of two descriptors defines as
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
We first randomly select an equal number of segment descriptors
stek in test template Tte from each quad and find its nearest neighbors staj _
in target template Tta The shift offset of them is recorded as the possible
registration shift factor _sk The final offset registration factor is _soptim
which has the smallest standard deviation among these candidate offsets
2) AFFINE TRANSFORM PARAMETER SEARCH
Affine transform is designed to tolerant some deformation of sclera
patterns in the matching step The affine transform algorithm is shown in
Algorithm 3 The shift value in the parameter set is obtained by randomly
selecting descriptor s(it )te and calculating the distance from its nearest
neighbor staj_ in Tta We transform the test template by the matrix in (7)
At end of the iteration we count the numbers of matched descriptor pairs
from the transformed template and the target template The factor β is
involved to determine if the pair of descriptor is matched and we set it to
be 20 pixels in our experiment After N iterations the optimized transform
parameter set is determined via selecting the maximum matching numbers
m(it) Here stei Tte staj and Tta is defined same as algorithm 2 tr (it )
shi f t θ(it )tr (it ) scale is the parameters of shift rotation and scale
transform generated in i tth iteration R(θ (it )) T (tr (it ) shi f t ) and S(tr (it
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
) scale) are the transform matrix defined as (7) To search optimize
transform parameter we iterated N times to generate these parameters In
our experiment we set iteration time to 512
3) REGISTRATION AND MATCHING ALGORITHM
Using the optimized parameter set determined from Algorithms 2
and 3 the test template will be registered and matched simultaneously The
registration and matching algorithm is listed in Algorithm 4 Here stei Tte
staj and Tta are defined same as Algorithms 2 and 3 θ(optm) tr (optm) shi f
t tr (optm) scale _soptim are the registration parameters attained from
Algorithms 2 and 3 R_θ(optm)_T _tr (optm) shi f t _S_tr (optm) scale
_ is the descriptor transform matrix defined in Algorithm 3 empty is the angle
between the segment descriptor and radius direction w is the weight of the
descriptor which indicates whether the descriptor is at the edge of sclera or
not To ensure that the nearest descriptors have a similar orientation we
used a constant factor α to check the abstract difference of two ɸ In our
experiment we set α to 5 The total matching score is minimal score of two
transformed result divided by the minimal matching score for test template
and target template
25 MAPPING THE SUBTASKS TO CUDA
CUDA is a single instruction multiple data (SIMD) system and
works as a coprocessor with a CPU A CUDA consists of many streaming
multiprocessors (SM) where the parallel part of the program should be
partitioned into threads by the programmer and mapped into those threads
There are multiple memory spaces in the CUDA memory hierarchy
register local memory shared memory global memory constant memory
and texture memory Register local memory and shared memory are on-
chip and could be a little time consuming to access these memories Only
shared memory can be accessed by other threads within the same block
However there is only limited availability of shared memory Global
memory constant memory and texture memory are off-chip memory and
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
accessible by all threads which would be very time consuming to access
these memories
Constant memory and texture memory are read-only and cacheable
memory Mapping algorithms to CUDA to achieve efficient processing is
not a trivial task There are several challenges in CUDA programming
If threads in a warp have different control path all the branches will be
executed serially To improve performance branch divergence within a
warp should be avoided
Global memory is slower than on-chip memory in term of access To
completely hide the latency of the small instructions set we should use on-
chip memory preferentially rather than global memory When global
memory access occurs threads in same warp should access the words in
sequence to achieve coalescence
Shared memory is much faster than the local and global memory space
But shared memory is organized into banks which are equal in size If two
addresses of memory request from different thread within a warp fall in the
same memory bank the access will be serialized To get maximum
performance memory requests should be scheduled to minimize bank
conflicts
251 MAPPING ALGORITHM TO BLOCKS
Because the proposed registration and matching algorithm has four
independent modules all the modules will be converted to different kernels
on the GPU These kernels are different in computation density thus we
map them to the GPU by various map strategies to fully utilize the
computing power of CUDA Figure 11 shows our scheme of CPU-GPU
task distribution and the partition among blocks and threads Algorithm 1 is
partitioned into coarse-grained parallel subtasks
We create a number of threads in this kernel The number of threads
is the same as the number of templates in the database As the upper middle
column shows in Figure 11 each target template will be assigned to one
thread One thread performs a pair of templates compare In our work we
use NVIDIA C2070 as our GPU Threads and blocks number is set to
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
1024 That means we can match our test template with up to 1024times1024
target templates at same time
Algorithms 2-4 will be partitioned into fine-grained subtasks which is
processed a section of descriptors in one thread As the lower portion of the
middle column shows in Figure 11 we assigned a target template to one
block Inside a block one thread corresponds a set of descriptors in this
template This partition makes every block execute independently and there
are no data exchange requirements between different blocks When all
threads complete their responding descriptor fractions the sum of the
intermediate results needs to be computed or compared A parallel prefix
sum algorithm is used to calculate the sum of intermediate results which is
show in right of Figure 11 Firstly all odd number threads compute the sum
of consecutive pairs of the results Then recursively every first of i (= 4 8
16 32 64 ) threads
compute the prefix sum on the new result The final result will be saved in
the first address which has the same variable name as the first intermediate
result
252 MAPPING INSIDE BLOCK
In shift argument searching there are two schemes we can choose to
map task
Mapping one pair of templates to all the threads in a block and then every
thread would take charge of a fraction of descriptors and cooperation with
other threads
Assigning a single possible shift offset to a thread and all the threads will
compute independently unless the final result should be compared with
other possible offset
Due to great number of sum and synchronization operations in every
nearest neighbor searching step we choose the second method to parallelize
shift searching In affine matrix generator we mapped an entire parameter
set searching to a thread and every thread randomly generated a set of
parameters and tried them independently The generated iterations were
assigned to all threads The challenge of this step is the randomly generated
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
numbers might be correlated among threads In the step of rotation and
scale registration generating we used the Mersenne Twister pseudorandom
number generator because it can use bitwise arithmetic and have long
period
The Mersenne twister as most of pseudorandom generators is iterative
Therefore itrsquos hard to parallelize a single twister state update step among
several execution threads To make sure that thousands of threads in the
launch grid generate uncorrelated random sequence many simultaneous
Mersenne twisters need to process with different initial states in parallel
But even ldquovery differentrdquo (by any definition) initial state values do not
prevent the emission of correlated sequences by each generator sharing
identical parameters To solve this problem and to enable efficient
implementation of Mersenne Twister on parallel architectures we used a
special offline tool for the dynamic creation of Mersenne Twisters
parameters modified from the algorithm developed by Makoto Matsumoto
and Takuji Nishimura In the registration and matching step when
searching the nearest neighbor a line segment that has already matched
with others should not be used again In our approach a flag
FIG
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
Variable denoting whether the line has been matched is stored in
shared memory To share the flags all the threads in a block should wait
synchronic operation at every query step Our solution is to use a single
thread in a block to process the matching
253 MEMORY MANAGEMENT
The bandwidth inside GPU board is much higher than the
bandwidth between host memory and device memory The data transfer
between host and device can lead to long latency As shown in Figure 11
we load the entire target templates set from database without considering
when they would be processed Therefore there was no data transfer from
host to device during the matching procedure In global memory the
components in descriptor y(ϕ1 ϕ2 ϕ3 x y) and s(x y rθ ϕw) were stored
separately This would guarantee contiguous kernels of Algorithm 2 to 4
can access their data in successive addresses Although such coalescing
access reduces the latency frequently global memory access was still a
slower way to get data In our kernel we loaded the test template to shared
memory to accelerate memory access Because the Algorithms 2 to 4
execute different number of iterations on same data the bank conflict does
not happen To maximize our texture memory space we set the system
cache to the lowest value and bonded our target descriptor to texture
memory Using this catchable memory our data access was accelerated
more
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
26 HISTOGRAM OF ORIENTED GRADIENTS
Histogram of oriented gradients is the feature descriptors It is primarily
applied to the design of target detection In this paper it is applied as the
feature for human recognition In the sclera region the vein patterns are the
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
edges of an image So HOG is used to determine the gradient orientation
and edge orientations of vein pattern in the sclera region of an eye image
To follow out this technique first of all divide the image into small
connected regions called cells For each cell compute the histogram of
gradient directions or edge orientations of the pixels Then the combination
of different histogram of different cell represents the descriptor To improve
accuracy histograms can be contrast normalized by calculating the intensity
from the block and then using this value normalizes all cells within the
block This normalization result shows that it is invariant to geometric and
photometric changes The gradient magnitude m(x y) and orientation 1050592(x
y) are calculated using x and y directions gradients dx (x y) and dy (x y)
Orientation binning is the second step of HOG This method utilized
to create cell histograms Each pixel within the cell used to give a weight to
the orientation which is found in the gradient computation Gradient
magnitude is used as the weight The cells are in the rectangular form The
binning of gradient orientation should be spread over 0 to 180 degrees and
opposite direction counts as the same In the Fig 8 depicts the edge
orientation of picture elements If the images have any illumination and
contrast changes then the gradient strength must be locally normalized For
that cells are grouped together into larger blocks These blocks are
overlapping so that each cell contributes more than once to the final
descriptor Here rectangular HOG (R-HOG) blocks are applied which are
mainly in square grids The performance of HOG is improved by putting
on a Gaussian window into each block
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
CHAPTER 3
SOFTWARE SPECIFICATION
31 GENERAL
MATLAB(matrix laboratory) is a numerical
computing environment and fourth-generation programming language
Developed by Math Works MATLAB allows matrix manipulations
plotting of functions and data implementation of algorithms creation
of user interfaces and interfacing with programs written in other languages
including C C++ Java and Fortran
Although MATLAB is intended primarily for numerical computing an
optional toolbox uses the MuPAD symbolic engine allowing access
to symbolic computing capabilities An additional package Simulink adds
graphicalmulti-domainsimulationandModel-Based
Design for dynamic and embedded systems
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
In 2004 MATLAB had around one million users across industry
and academia MATLAB users come from various backgrounds
of engineering science and economics MATLAB is widely used in
academic and research institutions as well as industrial enterprises
MATLAB was first adopted by researchers and practitioners
in control engineering Littles specialty but quickly spread to many other
domains It is now also used in education in particular the teaching
of linear algebra and numerical analysis and is popular amongst scientists
involved in image processing The MATLAB application is built around the
MATLAB language The simplest way to execute MATLAB code is to type
it in the Command Window which is one of the elements of the MATLAB
Desktop When code is entered in the Command Window MATLAB can
be used as an interactive mathematical shell Sequences of commands can
be saved in a text file typically using the MATLAB Editor as a script or
encapsulated into a function extending the commands available
MATLAB provides a number of features for documenting and
sharing your work You can integrate your MATLAB code with other
languages and applications and distribute your MATLAB algorithms and
applications
32 FEATURES OF MATLAB
High-level language for technical computing
Development environment for managing code files and data
Interactive tools for iterative exploration design and problem solving
Mathematical functions for linear algebra statistics Fourier analysis
filtering optimization and numerical integration
2-D and 3-D graphics functions for visualizing data
Tools for building custom graphical user interfaces
Functions for integrating MATLAB based algorithms with external
applications and languages such as C C++ FORTRAN Javatrade COM
and Microsoft Excel
MATLAB is used in vast area including signal and image
processing communications control design test and measurement
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
financial modeling and analysis and computational Add-on toolboxes
(collections of special-purpose MATLAB functions) extend the MATLAB
environment to solve particular classes of problems in these application
areas
MATLAB can be used on personal computers and powerful
server systems including the Cheaha compute cluster With the addition of
the Parallel Computing Toolbox the language can be extended with parallel
implementations for common computational functions including for-loop
unrolling Additionally this toolbox supports offloading computationally
intensive workloads to Cheaha the campus compute cluster MATLAB is
one of a few languages in which each variable is a matrix (broadly
construed) and knows how big it is Moreover the fundamental operators
(eg addition multiplication) are programmed to deal with matrices when
required And the MATLAB environment handles much of the bothersome
housekeeping that makes all this possible Since so many of the procedures
required for Macro-Investment Analysis involves matrices MATLAB
proves to be an extremely efficient language for both communication and
implementation
321 INTERFACING WITH OTHER LANGUAGES
MATLAB can call functions and subroutines written in the C
programming language or FORTRAN A wrapper function is created
allowing MATLAB data types to be passed and returned The dynamically
loadable object files created by compiling such functions are termed MEX-
files (for MATLAB executable)
Libraries written in Java ActiveX or NET can be directly called
from MATLAB and many MATLAB libraries (for
example XML or SQL support) are implemented as wrappers around Java
or ActiveX libraries Calling MATLAB from Java is more complicated but
can be done with MATLAB extension which is sold separately by Math
Works or using an undocumented mechanism called JMI (Java-to-Mat lab
Interface) which should not be confused with the unrelated Java that is also
called JMI
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
As alternatives to the MuPAD based Symbolic Math Toolbox
available from Math Works MATLAB can be connected
to Maple or Mathematical
Libraries also exist to import and export MathML
Development Environment
Startup Accelerator for faster MATLAB startup on Windows especially on
Windows XP and for network installations
Spreadsheet Import Tool that provides more options for selecting and
loading mixed textual and numeric data
Readability and navigation improvements to warning and error messages in
the MATLAB command window
Automatic variable and function renaming in the MATLAB Editor
Developing Algorithms and Applications
MATLAB provides a high-level language and development
tools that let you quickly develop and analyze your algorithms and
applications
The MATLAB Language
The MATLAB language supports the vector and matrix operations
that are fundamental to engineering and scientific problems It enables fast
development and execution With the MATLAB language you can
program and develop algorithms faster than with traditional languages
because you do not need to perform low-level administrative tasks such as
declaring variables specifying data types and allocating memory In many
cases MATLAB eliminates the need for lsquoforrsquo loops As a result one line of
MATLAB code can often replace several lines of C or C++ code
At the same time MATLAB provides all the features of a traditional
programming language including arithmetic operators flow control data
structures data types object-oriented programming (OOP) and debugging
features
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
MATLAB lets you execute commands or groups of commands one
at a time without compiling and linking enabling you to quickly iterate to
the optimal solution For fast execution of heavy matrix and vector
computations MATLAB uses processor-optimized libraries For general-
purpose scalar computations MATLAB generates machine-code
instructions using its JIT (Just-In-Time) compilation technology
This technology which is available on most platforms provides
execution speeds that rival those of traditional programming languages
Development Tools
MATLAB includes development tools that help you implement
your algorithm efficiently These include the following
MATLAB Editor
Provides standard editing and debugging features such as setting
breakpoints and single stepping
Code Analyzer
Checks your code for problems and recommends modifications to
maximize performance and maintainability
MATLAB Profiler
Records the time spent executing each line of code
Directory Reports
Scan all the files in a directory and report on code efficiency file
differences file dependencies and code coverage
Designing Graphical User Interfaces
By using the interactive tool GUIDE (Graphical User Interface
Development Environment) to layout design and edit user interfaces
GUIDE lets you include list boxes pull-down menus push buttons radio
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
buttons and sliders as well as MATLAB plots and Microsoft
ActiveXreg controls Alternatively you can create GUIs programmatically
using MATLAB functions
322 ANALYZING AND ACCESSING DATA
MATLAB supports the entire data analysis process from acquiring
data from external devices and databases through preprocessing
visualization and numerical analysis to producing presentation-quality
output
Data Analysis
MATLAB provides interactive tools and command-line functions for data
analysis operations including
Interpolating and decimating
Extracting sections of data scaling and averaging
Thresholding and smoothing
Correlation Fourier analysis and filtering
1-D peak valley and zero finding
Basic statistics and curve fitting
Matrix analysis
Data Access
MATLAB is an efficient platform for accessing data from
files other applications databases and external devices You can read data
from popular file formats such as Microsoft Excel ASCII text or binary
files image sound and video files and scientific files such as HDF and
HDF5 Low-level binary file IO functions let you work with data files in
any format Additional functions let you read data from Web pages and
XML
Visualizing Data
All the graphics features that are required to visualize engineering
and scientific data are available in MATLAB These include 2-D and 3-D
plotting functions 3-D volume visualization functions tools for
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
interactively creating plots and the ability to export results to all popular
graphics formats You can customize plots by adding multiple axes
changing line colors and markers adding annotation Latex equations and
legends and drawing shapes
2-D Plotting
Visualizing vectors of data with 2-D plotting functions that create
Line area bar and pie charts
Direction and velocity plots
Histograms
Polygons and surfaces
Scatterbubble plots
Animations
3-D Plotting and Volume Visualization
MATLAB provides functions for visualizing 2-D matrices 3-
D scalar and 3-D vector data You can use these functions to visualize and
understand large often complex multidimensional data Specifying plot
characteristics such as camera viewing angle perspective lighting effect
light source locations and transparency
3-D plotting functions include
Surface contour and mesh
Image plots
Cone slice stream and isosurface
323 PERFORMING NUMERIC COMPUTATION
MATLAB contains mathematical statistical and engineering
functions to support all common engineering and science operations These
functions developed by experts in mathematics are the foundation of the
MATLAB language The core math functions use the LAPACK and BLAS
linear algebra subroutine libraries and the FFTW Discrete Fourier
Transform library Because these processor-dependent libraries are
optimized to the different platforms that MATLAB supports they execute
faster than the equivalent C or C++ code
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
MATLAB provides the following types of functions for performing
mathematical operations and analyzing data
Matrix manipulation and linear algebra
Polynomials and interpolation
Fourier analysis and filtering
Data analysis and statistics
Optimization and numerical integration
Ordinary differential equations (ODEs)
Partial differential equations (PDEs)
Sparse matrix operations
MATLAB can perform arithmetic on a wide range of data types
including doubles singles and integers
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
CHAPTER 4
IMPLEMENTATION
41 GENERAL
Matlab is a program that was originally designed to simplify the
implementation of numerical linear algebra routines It has since grown into
something much bigger and it is used to implement numerical algorithms
for a wide range of applications The basic language used is very similar to
standard linear algebra notation but there are a few extensions that will
likely cause you some problems at first
42 SNAPSHOTS
ORIGINAL SCLERA IMAGE IS CONVERTED INTO GREY SCALE IMAGE
FIG
GREY SCALE IMAGE IS CONVERTED INTO BINARY IMAGE
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
EDGE DETECTON IS DONE BY OTSUrsquoS THRESHOLDING
FIG
SELECTING THE REGION OF INTEREST (SCLERA PART)
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
SELECTED ROI PART
FIG
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
ENHANCEMENT OF SCLERA IMAGE
FIG
FEATURE EXTRACTION OF SCLERA IMAGE USING GABOR
FILTERS
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
MATCHING WITH IMAGES IN DATABASE
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
FIG
DISPLAYING THE RESULT (MATCHED OR NOT MATCHED)
FIG
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
CHAPTER 5
APPLICATIONS
The applications of biometrics can be divided into the following three main groups
Commercial applications such as computer network login electronic data security ecommerce Internet access ATM credit card physical access control cellular phone PDA medical records management distance learning etc
Government applications such as national ID card correctional facility driverrsquos license social security welfare-disbursement border control Passport control etc
Forensic applications such as corpse identification criminal investigation terrorist identification parenthood determination missing children etc Traditionally commercial applications have used knowledge-based systems (eg PIN sand passwords) government applications have used token-based systems (eg ID cards and badges) and forensic applications have relied on human experts to match biometric features Biometric systems are being increasingly deployed in large scale civilian applications The Schiphol Premium scheme at the Amsterdam airport for example employs iris scan cards to speed up the passport and visa control procedures
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
61 CONCLUSION
In this paper we proposed a new parallel sclera vein recognition
method which employees a two stage parallel approach for registration and
matching Even though the research focused on developing a parallel sclera
matching solution for the sequential line-descriptor method using CUDA
GPU architecture the parallel strategies developed in this research can be
applied to design parallel solutions to other sclera vein recognition methods
and general pattern recognition methods We designed the Y shape
descriptor to narrow the search range to increase the matching efficiency
which is a new feature extraction method to take advantage of the GPU
structures We developed the WPL descriptor to incorporate mask
information and make it more suitable for parallel computing which can
dramatically reduce data transferring and computation We then carefully
mapped our algorithms to GPU threads and blocks which is an important
step to achieve parallel computation efficiency using a GPU A work flow
which has high arithmetic intensity to hide the memory access latency was
designed to partition the computation task to the heterogeneous system of
CPU and GPU even to the threads in GPU The proposed method
dramatically improves the matching efficiency without compromising
recognition accuracy
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
62 REFERENCES
[1] C W Oyster The Human Eye Structure and Function Sunderland
Sinauer Associates 1999
[2] C Cuevas D Berjon F Moran and N Garcia ldquoMoving object
detection for real-time augmented reality applications in a GPGPUrdquo IEEE
Trans Consum Electron vol 58 no 1 pp 117ndash125 Feb 2012
[3] D C Cirean U Meier L M Gambardella and J Schmidhuber ldquoDeep
big simple neural nets for handwritten digit recognitionrdquo Neural Comput
vol 22 no 12 pp 3207ndash3220 2010
[4] F Z Sakr M Taher and A M Wahba ldquoHigh performance iris
recognition system on GPUrdquo in Proc ICCES 2011 pp 237ndash242
[5] G Poli J H Saito J F Mari and M R Zorzan ldquoProcessing
neocognitron of face recognition on high performance environment based
on GPU with CUDA architecturerdquo in Proc 20th Int Symp Comput
Archit High Perform Comput 2008 pp 81ndash88
[6] J Antikainen J Havel R Josth A Herout P Zemcik and M Hauta-
Kasari ldquoNonnegative tensor factorization accelerated using GPGPUrdquo IEEE
Trans Parallel Distrib Syst vol 22 no 7 pp 1135ndash1141 Feb 2011
[7] K-S Oh and K Jung ldquoGPU implementation of neural networksrdquo
Pattern Recognit vol 37 no 6 pp 1311ndash1314 2004
[8] P R Dixon T Oonishi and S Furui ldquoHarnessing graphics processors
for the fast computation of acoustic likelihoods in speech recognitionrdquo
Comput Speech Lang vol 23 no 4 pp 510ndash526 2009
[9] P Kaufman and A Alm ldquoClinical applicationrdquo Adlerrsquos Physiology of
the Eye 2003
[10] R N Rakvic B J Ulis R P Broussard R W Ives and N Steiner
ldquoParallelizing iris recognitionrdquo IEEE Trans Inf Forensics Security vol 4
no 4 pp 812ndash823 Dec 2009
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012
[11] S Crihalmeanu and A Ross ldquoMultispectral scleral patterns for ocular
biometric recognitionrdquo Pattern Recognit Lett vol 33 no 14 pp 1860ndash
1869 Oct 2012
[12] W Wenying Z Dongming Z Yongdong L Jintao and G
Xiaoguang ldquoRobust spatial matching for object retrieval and its parallel
implementation on GPUrdquo IEEE Trans Multimedia vol 13 no 6 pp
1308ndash1318 Dec 2011Multimedia Sec Magdeburg Germany Sep 2004
pp 4ndash15
[13] Y Xu S Deka and R Righetti ldquoA hybrid CPU-GPGPU approach for
real-time elastographyrdquo IEEE Trans Ultrason Ferroelectr Freq Control
vol 58 no 12 pp 2631ndash2645 Dec 2011
[14] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
multimodal eye recognitionrdquo Signal Image Video Process vol 7 no 4
pp 619ndash631 Jul 2013
[15] Z Zhou E Y Du N L Thomas and E J Delp ldquoA comprehensive
approach for sclera image quality measurerdquo Int J Biometrics vol 5 no 2
pp 181ndash198 2013
[16] Z Zhou E Y Du N L Thomas and E J Delp ldquoA new human
identification method Sclera recognitionrdquo IEEE Trans Syst Man
Cybern A Syst Humans vol 42 no 3 pp 571ndash583 May 2012