Evaluation of an Algorithm for Finding a Match of a ...

Evaluation of an Algorithm for Finding aMatch of a Distorted Texture Pattern in aLarge Image Database

N. VUJOVIC and D. BRZAKOVICLehigh University

Evaluation of an algorithm for finding a match for a random texture pattern in a large imagedatabase is presented. The algorithm was designed assuming that the random pattern may besubject to misregistration relative to its representation in the database and assuming that itmay have missing parts. The potential applications involve authentication of legal documents,bank notes, or credit cards, where thin fibers are embedded randomly into the documentmedium during medium fabrication. The algorithm achieves image matching by a three-stephierarchical procedure, which starts by matching parts of fiber patterns while solving themisregistration problem and ends up by matching complete fiber patterns. Performance ofthe algorithm is studied both theoretically and experimentally. Theoretical analysis includes thestudy of the probability that two documents have the same pattern, and the probability of thealgorithm establishing a wrong match, as well as the algorithm’s performance in terms ofprocessing time. Experiments involving over 250,000 trials using databases of syntheticdocuments, containing up to 100,000 documents, were used to confirm theoretical predictions.In addition, experiments involving a database containing real images were conducted in orderto confirm that the algorithm has potential in real applications.

Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Infor-mation Search and Retrieval; I.2.4 [Artificial Intelligence]: Knowledge RepresentationFormalisms and Methods; I.4.3 [Image Processing]: Enhancement—filtering; grayscalemanipulation; I.4.7 [Image Processing]: Feature Management—texture

General Terms: Algorithms, Experimentation

Additional Key Words and Phrases: Image database, image matching, misregistration, ran-dom pattern, presentation of information

1. INTRODUCTION

This article considers the problem of finding a match for a texture patternin a large image database. The specific motivation behind this study isauthentication of random texture patterns embedded in document media,

Authors’ address: Department of Electrical Engineering and Computer Science, LehighUniversity, Bethlehem, PA 18015; email: {nsv2; dbrzakov}@vision.eecs.lehigh.edu.Permission to make digital / hard copy of part or all of this work for personal or classroom useis granted without fee provided that the copies are not made or distributed for profit orcommercial advantage, the copyright notice, the title of the publication, and its date appear,and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, torepublish, to post on servers, or to redistribute to lists, requires prior specific permissionand / or a fee.© 1998 ACM 1046-8188/98/0100–0031 $03.50

ACM Transactions on Information Systems, Vol. 16, No. 1, January 1998, Pages 31–60.

such as paper in legal documents and banknotes or plastic used for creditcards. The method under study extends to solving other problems wherethe objective is to establish a match between a given pattern and itsrepresentation in a database, e.g., fingerprint identification and DNAmatching.

Methodologically, document authentication shares common attributeswith problems that require automated search through large databases.This research area is rapidly growing due to the increasing importance ofvideo databases. Typically, the objective of search is to retrieve similarimages based on an example or query provided by the user; for example, seePetrakis and Orphanoudakis [1993], Pikard and Kabir [1993], and Soga etal. [1993]. Efficient solutions utilize transformations that map perceptuallysimilar images into a feature space where similarity translates into prox-imity. The feature extraction involves identification of meaningful intensityvariations and landmarks. In the case of purely texture patterns, land-marks are difficult to define, and perceptual similarity is usually estab-lished globally, based on texture descriptors [Pikard and Kabir 1993].

A class of search problems requires matching a given image/pattern witha specific image/pattern in the database. In some cases, matching pertainsto finding the most similar image, e.g., problems dealing with face recogni-tion [Pentland et al. 1994]. Such problems require establishing similaritymetrics, but in addition require dealing with complex image-processingissues, including segmentation (Chellappa et al. [1993] discusses theseissues). In other cases, it is necessary to establish an exact match betweena given image/pattern and an entry in the database. This type of matchingis required in DNA analysis, where the objective is to identify subpatterns[Guigo and Smith 1993; Henikoff and Henikoff 1995], and fingerprintmatching, where the objective is to identify the exact match [Mehtre 1993;Thomopoulos and Reisman 1994].

The establishment of an exact match between images frequently requiressolving the registration problem. The problem of registering image pairsand image sequences has been of interest for many years (comprehensivereview of the approaches can be found in Brown [1992]). A class ofregistration techniques that is of interest to this work is referred to aspoint pattern matching. These techniques estimate the mapping relatingsets of feature points in two images [Skea and Barrodale 1991; Skea et al.1993]. The proposed solutions to the point pattern-matching problemcapitalize on the fact that the form of mapping function between two sets ofpoints is known. The most general problems pertain to solving matchingrelated to 3D scene understanding, e.g., recognition of 3D objects of knowngeometry. Such problems may be solved using geometric hashing [Lamdanand Wolfson 1988]. While this formulation of 3D recognition problemprovides for dealing with occlusions and missing parts it is sensitive tonoise and requires extensive computations [Grimson and Huttenlocher1990]. An alternative solution to 3D matching involving translation, rota-tion, and limited scaling is proposed in Zheng and Chellappa [1993]. Thismethod is appropriate for a class of stereo problems. The solutions to

32 • N. Vujovic and D. Brzakovic

ACM Transactions on Information Systems, Vol. 16, No. 1, January 1998.

predominantly 2D problems concentrate primarily on mappings involvingtranslation and rotation. An approach to mapping limited to translationwas proposed in Kahl et al. [1980] and further extended in Ranade andRosenfeld [1980]. The essence of the approach is relaxation and establish-ment of a match between two sets of points based on the highest figure ofmerit computed for each possible translation. Some authors have consid-ered the point pattern-matching problem in the framework of graph match-ing. Proposed solutions include complete matching (e.g., Vaidya [1989]),matching a sample graph with a subgraph of a larger model graph (e.g.,Pearce et al. [1994]), or matching graphs whose topologies are corrupted byerrors in early stages of image processing (e.g., Wilson and Hakcock[1993]).

The common characteristic shared by different methodologies proposed tosolve point pattern-matching problems is that they exhaustively search forthe best match between two sets of points using highly redundant informa-tion. The algorithm considered in this article matches a test image with itsrepresentation in a large database, possibly comprised of billions of docu-ments. Since matching requires the consideration of all database entries, itis computationally prohibitive to use approaches that exhaustively attemptto match two images. Instead, it is necessary first to identify a candidatematch and then search for the transformation that relates the two images.The algorithm analyzed in this work is based on concepts introduced inBrzakovic and Vujovic [1996]. It comprises three hierarchically organizedsteps; the matching starts with establishing correspondence among subpat-terns, and in the last step the complete texture patterns are matched. Thealgorithm is designed to identify a match, assuming that there is misalign-ment (translational and rotational) between a test pattern and its repre-sentation in the database. In addition, it is assumed that the test patternmay be damaged and has missing parts. The methodology employed in thefirst step of hierarchy is unique and provides for finding candidate matches;the processing performed in the second step may be replaced by thepoint-matching algorithm exhaustively searching for matches, such as theones discussed above.

This article concentrates on evaluating the algorithm’s performance andadapting it to real applications. The problem and the algorithm arediscussed in Section 2. The crux of the present work is described in Sections3 and 4, which concentrate on the algorithm’s performance: its ability toretrieve the correct document and its efficiency. The performance is studiedfirst theoretically, and then the theoretical results are confirmed experi-mentally using large databases of synthetic images. Important issuesarising in dealing with real images are discussed in Section 5, whichdescribes the algorithm’s performance on a small set of real images.

2. DOCUMENT AUTHENTICATION: BACKGROUND

In this work we consider documents encoded by short fibers that areembedded randomly into the document medium during medium fabrica-

An Algorithm for Finding a Match of a Distorted Texture Pattern • 33


tion.1 The resulting random texture pattern consists of intersecting elon-gated structures (fibers). Examples of images of a real and a synthetic fiberpattern are shown in Figure 1. It can be seen from the figure that an imagecomprises a background and a fiber pattern. In authentication, only fiberscarry useful information; consequently, the authentication requires as afirst step fiber extraction from the background. Thus, the texture patternsconsidered in this work are binary. The related image-processing aspectsare discussed in Section 5.

The algorithm is developed assuming (1) rigid documents, i.e., assumingthat the effects of medium stretching are negligible, and (2) the sameacquisition conditions, i.e., that digitization is performed using the samesensor at the same resolution. The algorithm is developed assuming thatdocuments are subject to two types of distortions frequently seen inpractice. The first distortion pertains to damage that a document mayexperience in its lifetime, and the second distortion pertains to misalign-ment problems associated with document redigitization. Specifically, thedistortions are described as follows:

—Distortion 1: Missing Fibers. A document under investigation may con-tain a subset of fibers relative to its representation in the database, i.e.,parts of a document may be damaged or missing. In general, only aportion of the original document pattern may be seen in later acquisi-tions. It is assumed that at most Nmax randomly placed fibers or acontinuous part of a document may be missing. Missing fibers posetheoretical problems in terms of establishing a correct match becausedifferent documents may have the same subset of fibers. However, in

1The term “short fibers” refers to fibers whose length is less than 10% of a document’sdimension (width or length).

Fig. 1. Random fiber patterns used for document authentication; (a) real sample; (b)synthetic image.



view of document uniqueness (Section 3.1.1) this is a very unlikely event,and the confidence of the established match is very high.

—Distortion 2: Misalignment. A test document may be translated and/orrotated relative to its representation in the database, i.e., documentacquisition is subject to misalignment. Maximum misalignment is de-scribed by the maximum translation Dxmax( x direction) and Dymax( ydirection) and the maximum rotation umax. In the most general case, foran N 3 N image, 2p # umax # p and 2N , Dxmax, Dymax , N. Largetranslations may violate the limits imposed by Distortion 1 because largeportions of the document may be out of the field of view.

2.1 Algorithm Description

The proposed algorithm capitalizes on the fact that the assumed documentdistortions preserve distances between fibers. In order to reduce theamount of data to be retrieved and processed, the algorithm is organizedinto three levels of hierarchy. Each of the levels uses a separate database.Levels 1 and 2 utilize fiber end points and their relationship, and level 3uses the complete fibers. At the first level the algorithm exhaustivelyattempts to match parts of the end point pattern of the document underinvestigation with any database document; at levels 2 and 3 only specificdocuments are examined. If a match established at level 1 fails at levels 2or 3, the algorithm returns to processing at level 1.

2.1.1 Authentication Level 1. The objective of the authentication algo-rithm at level 1 is to establish if three end point subpatterns of anydocument in the database match three end point subpatterns in the testdocument. A subpattern pertains to a pattern of fiber end points lying incircle, #, of radius R centered at an end point (Figure 2(a)). A subpattern isrepresented as a sequence of ascending distances between the center end

Fig. 2. Matching at level 1; (a) illustration of a subpattern that is matched between images;(b) division of an image into four quadrants.



point and all end points in #, i.e., a subpattern centered at an end pointP( x, y) is represented by coordinates (x, y) and 6 5 (d1, d2, . . . , dk),di21 # di, i 5 2, . . . , k, where di represents the distance between P andthe ith closest end point in #. For efficiency, it is necessary that the radiusR encompasses a sufficient number of points, k, so that the sequences ofdistances for a point be unique. This, in turn, is determined by density offibers and expected number of missing fibers. In principle, the larger R thehigher uniqueness of the sequence pattern; however, large values of Rencompassing most of a document require considerable processing. As ageneral rule, in view of document uniqueness (Section 3.1.1) selecting R sothat # encompasses about 1/30 of the document area is appropriate, i.e., fora document of size a 3 a R 5 a/8.

A document in the database is represented by J subpatterns equallydivided between four document quadrants as shown in Figure 2(b).2 Each ofthe three matched subpatterns must be centered in a different quadrant.3

The selection of value J is dictated by the assumptions about the number ofmissing fibers in the test document. For Nmax missing fibers arbitrarilypositioned in the document, it is necessary to ensure that three out of fourquadrants provide at least one matched point each. In the worst case,assuming that all Nmax points belong to points stored in the database andare shared equally between any two quadrants, a sufficient number ofmatches will be obtained if these two quadrants jointly store 2J . Nmaxpoints, i.e., each quadrant stores J $ (Nmax11)/ 2 end points.4 The match-ing algorithm proceeds in three steps at level 1 as follows:

(1) Step 1: This step involves matching between subpatterns in the first orsecond quadrants. In order to consider subpatterns a match, the coordi-nates of their centers, P1 and P91, must be within maximum assumedmisalignment, and the sequence of distances for the test documentpoint P1, 61 5 (d1, d2, . . . , dk), must correspond to the sequence ofdistances 691 5 (d91, d92, . . . , d9k9) for P91 in the database document.Taking into account missing fibers, we see that the correspondencebetween the sequences exists if k # k9 and if all distances in 61 arewithin the threshold value Td from the corresponding distances in 691.(The threshold value Td is required because of digitization effects, andit is a function of resolution; in this work we use Td 5 61.)

To establish a match the search is done exhaustively, and thealgorithm attempts to match each end point subpattern in the first (or

2Considering that the algorithm aims at establishing matches between three pairs of points,where each pair belongs to a different part of an image, division into four quadrants providesfor a framework that can easily handle the relationships between coordinates of pointsbelonging to different subregions.3The extent of a matching quadrant is determined by the assumed maximum misalignmentand in the worst case extends over a whole image.4Each of the J points belongs to a different fiber and is located close to the typical center of thequadrant to ensure that, for expected misalignments, the fibers appear in the redigitizedimage.



second) quadrant of the test document with all candidate subpatternsin the database documents. When the first subpattern match is foundfor a document in the database, the second subpattern match isattempted for the same document in the next quadrant. Failure toestablish the first subpattern match indicates that there is no match forthe test document in the database.

(2) Step 2: Given that the subpatterns centered at points P1( x1, y1) andP91( x91, y91) (in the _th database document) match, the objective in thisstep is to establish the second subpattern match in the next quadrant ofthe same document. A database document subpattern centered at pointP92( x92, y92) potentially matches a subpattern centered at P2( x2, y2) inthe test document if P2 lies on the circle centered at P1 with its radiusequal to the distance between P91 and P92. In essence, the secondsubpattern match capitalizes on the preservation of distances underassumed distortions, as illustrated in Figure 3(a). In practice, becauseof the digitization effects, the center of the second subpattern matchlies on an annulus whose width is determined by selected resolution.Only the relevant part of the annulus, determined by the assumptionsabout the maximum misregistration, is searched for the center point.(In this work, the annulus is three pixels wide.) A potential match isconsidered a match if the sequence of distances for P2( x2, y2) corre-sponds to the sequence of distances for P92( x92, y92) in the sensedescribed for the first point match. If a second subpattern match cannotbe established for the _th database image, the search continues byestablishing a new first subpattern match by continuing with Step 1.

(3) Step 3: When both the first and second subpattern matches are estab-lished for the _th database document, a third match is attempted forthe same document in the next quadrant. The locations of the candidatecenter points in the test document are determined based on distancesbetween points P91 and P92 and each of the end points stored in the nextquadrant of the same database document. Consequently, the thirdcenter point location is determined by examining J small neighborhoodsin the test document, as illustrated in Figure 3(b). The match isestablished based on distance sequences in the same way as for the firstand second subpattern matches.

2.2 Authentication Level 2

At this level the authentication algorithm establishes matches between allremaining end points in the test document and the database documentselected at level 1. After the match is established between three subpat-terns, i.e., between the three center points, any two pairs of matched centerpoints may be used to determine the translation, x0 and y0, and rotation, u,between the matched documents. In order to improve accuracy, the rotationand translation parameters are computed for three combinations ofmatched pairs of points, and the consistent results are averaged.



Calculated values u, x0, and y0 are used to check for matches for allremaining end points in the test document. A k 3 k neighborhood (in thiswork k 5 3), centered at a location determined by the coordinates of thepoint in the test document and the rotation and translation parameters, issearched for the match. The neighborhood is considered instead of actualcoordinates because of digitization effects and calculation errors. Twodocuments are matched at level 2 if each of the existing end points in thetest document has a match in the database document.

Fig. 3. Illustration of Steps 2 and 3; (a) Step 2: only subpatterns centered at points lying ona circle of radius D are considered. The circle is centered at the point determined in the firstpoint match, and its radius is determined by the distance between point P91 and a point P92stored as the representative of the next quadrant; (b) Step 3: the location of P3 is determinedby distances between the candidate match in the database document and the alreadyestablished matches. In order to establish the match, the distance sequences must match inthe same way as for the first and second match.



2.3 Authentication Level 3

If the fiber end points match at level 2, the third step in the authenticationprocedure is performed: fiber matching by correlation. A whole fiber matchis established on a pixel basis using coordinates of a point in the testdocument and calculated values of u, x0, and y0. A match is successful ifeach fiber in the test document matches a fiber in the database image.Matching may involve the binary fiber image or its skeleton. Given a fiberpoint in one image, the coordinates of its corresponding point are calculatedbased on parameters x0, y0, and u calculated at level 2, and a k 3 kneighborhood is searched for the match. It should be noted that, due tomissing fibers, the database image may have fibers that do not havecorrespondence in the test image.

3. ALGORITHM PERFORMANCE: ESTABLISHING THE CORRECT MATCH

In the following we consider the algorithm’s ability to identify the correctmatch in the database when such a match exists. First, in Section 3.1 weevaluate the algorithm’s accuracy theoretically, and then in Section 3.2 weconfirm the obtained results experimentally.

3.1 Theoretical Considerations

In the following we evaluate the algorithm’s performance with respect totwo types of distortions. First, we consider the probability that the algo-rithm may identify a wrong document when the test document is subject tomisalignment, i.e., Distortion 2 only. Next, we analyze the algorithm’sability to identify the correct document if the test document is also subjectto missing fibers, i.e., Distortion 1.

3.1.1 Document Uniqueness. Assuming Distortion 2 only, the algo-rithm will identify a wrong document at level 3 if two or more documents inthe database have identical patterns. Consequently, it is necessary toconsider the probability that two documents may have identical patterns.The inverse of this probability is document uniqueness, which in practicalterms determines the potential database size.

The uniqueness is determined by fiber density,5 and in the following weconsider documents in digital form and differentiate between the number ofpossible documents, K1, and the number of detectable (differentiable)documents, K. If the documents are not densely covered by fibers, then K 'K1. For high fiber densities, the saturation effect is significant; if K2represents the saturation effect, the number of detectable documents isK 5 K1 2 K2. For simplicity, we first consider the case where fibers canadequately be modeled by line segments with orientation being a randomvariable uniformly distributed in the range [0 : 180), i.e., pdf(a) 5 1/180[u(a) 2 u(a 2 180)] where u(z) denotes the step function, and a models

5Additional parameters that determine document uniqueness are fiber length and width,which for the purpose of this study are assumed to be fixed.



fiber orientation. It should be pointed out that calculations are approxi-mate and give the lower bounds of the document uniqueness.

For a document of size ( p 3 q) cm2, with d f ibers/cm2, (r 3 r) cm2

pixel size, and l cm fiber length, the number of pixels per document is A 5int( pq/r2), and the number of fibers per document is n 5 pqd. (Symbol intdenotes the lower nearest integer.) If s1 is the number of different fiberswith a fixed central point, then a single fiber can approximately take N 5s1 A different positions in a document. Therefore,

K1 5 SNn D 5 S s1A

pqdD . (1)

To estimate the number of differentiable documents we calculate thenumber of documents that cannot be differentiated and then subtract thisnumber from the total number of documents. Given a document _i thatcontains n fibers, another document _j can be differentiated from _i if atleast one of the fibers in _j falls in an area that is not completely occupiedby the fibers in _i. If fiber density is small, the saturation effect isinsignificant, since total overlap between the fibers is an unlikely event. Asfiber density increases, patches totally covered by fibers are formed. Thesize of patches increases as density increases, thus causing significantsaturation effects because any combination of fibers in the patches pro-duces the same document from the authentication point of view. Consider-ing the existence of j such patches in a document each covering an area Ai,i 5 1, 2, . . . , j, and containing at least n( Ai/A) fibers, where A denotesthe area of a document, the number of fibers in all patches is at least

nO i51

j Ai

A.

Since only a part of the fibers is contained in patches,

nO i51

j Ai

A# n

area occupied by all fibers

A# n

m1

A,

where m1 is the maximum number of pixels occupied by fibers, i.e., m1 5nl. Thus, to estimate the saturation effect we calculate the number ofpossible combinations of n1 5 n(m1/A) fibers over the area of m1 pixels;therefore,

K2 5 SN1

n1D 5 1 s1m1

nm1

A2 , (2)

where N1 is the number of different combinations for a single fiber over anarea of m1 pixels. It should be noted that the above considerations are only



approximate, and for simplicity we have assumed that all patches form onelarge patch, i.e., a continuous region of m1 pixels, thus largely overestimat-ing the saturation effect. The number of different (detectable) documents isK 5 K1 2 K2.

Considerably higher uniqueness is obtained when considering fibers ofarbitrary shape. Assuming that curved fibers are locally modeled by linearfunctions and starting with an end fiber point, the second fiber point is oneof the eight neighboring points, and the remaining points are any of thefive neighboring points (in the direction determined by the second fiberpoint). Thus, for a fixed end point a fiber may have s2 5 1 3 8 3 5 l22

different shapes. To calculate K1 and K2 for documents consisting of fibersof arbitrary shapes, s1 is replaced by s2 in Eqs. (1) and (2). For small fiberdensities, K2 can be neglected, and K ' K1. When fiber density increases,K2 also increases. For high fiber densities, K1 and K2 become very largecomparable numbers, and K1 2 K2 becomes of the type ` 2 `. To gainsome insight into how K changes with fiber density d, it is convenient tostudy the behavior of log(K1/K2), shown in Figure 4, as a function of fiberdensity. As expected, first log(K1/K2) increases with d; then the increasebecomes slower, and finally, log(K1/K2) starts to decrease. The saturationeffect is significant in regions with negative slopes. Using the graph in

Fig. 4. Graphical representation of document uniqueness for high fiber densities when fibersare modeled by linear segments (solid curve) and when fibers are of arbitrary shape (dashedcurve).



Figure 4, it is possible to determine the optimal fiber density by definingthe cost function, which incorporates the number of documents, the cost ofembedding fibers, and the cost of authentication.

To illustrate a feasible range of values for K, consider a document of asize of a dollar bill, i.e., 6cm 3 15.6cm, fiber density d 5 6 fibers/cm2,pixel size (0.0215 3 0.0215)cm2, and fiber length l 5 35 pixels. Then m 5203,000 pixels; N 5 8.8 107; n ' 561 fibers. The estimated number ofdifferent documents is K ' 103150; this number establishes the upper boundof the database size containing documents of given specifications. The samespecifications for a document of a size corresponding to that of the creditcard, i.e., 5.4cm 3 8.6cm, yield K ' 101560.

3.1.2 Impact of Missing Fibers. Document uniqueness is affected bymissing fibers because the test document may contain only a subset offibers existing in its original version. To guarantee that the correct matchcan be established, the algorithm requires storing J fibers per imagequadrant, where J $ (Nmax 1 1/ 2), and Nmax is the maximum number ofrandomly placed fibers that may be missing from a document. For a largeNmax it may be impractical to store the necessary number of end points.The following analyzes the effects of storing an insufficient number of endpoints on the algorithm’s performance, i.e., the probability that a matchwill be established if the number of missing fibers is larger than assumed.

For simplicity and without loss of generality, we assume that J 5 1 endpoints are stored per quadrant and that no more than p 5 2 distances areallowed to be missing from the distance sequences when establishing thefirst, second, and third point matches. The overall effect of missing fibers ismodeled as probability P that a match cannot be established, and

P 5 P1 1 P2 1 P3 ,

where

—P1 is the probability that at least two of the points stored in the databaseare missing,

—P2 is the probability that one of the points stored in the database ismissing and at least p 1 1 5 3 points from at least one of the remainingthree database distance sequences are missing, and

—P3 is the probability that none of the database points is missing and atleast p 1 1 5 3 points from at least two distance sequences are missing

when n randomly placed points are missing from the end point image.To illustrate the procedure for calculating Pi, i 5 1, 2, 3, we consider as

an example a document containing 120 fibers per image, i.e., N 5 240 endpoints. The total of n out of N points may be missing, and some of them



may be stored in some form in the database. Using the general formula

P~n1 , . . . , np ; N1 , . . . , Np!

5n!

n1 ! . . . np!

N1~N1 2 1!. . .~N1 2 n1 1 1!. . .Np~Np 2 1!. . .~Np 2 np 1 1!

N~N 2 1!. . .~N 2 n 1 1!

and

Oi51

p

ni 5 n, Oi51

p

Ni 5 N,

where (n1, . . . , np; N1, . . . , Np) denotes the event that ni points out ofthe Ni, i 5 1, 2, . . . , p, are missing from the document, we evaluate theprobabilities of interest, i.e., P1, P2, and P3 as follows:

P1 5 Prob$at least 2 database pts are missing%

5 1 2 Prob$up to 2 database pts are missing%

5 1 2 @Prob$0 database pt missing% 1 Prob$1 database pt missing%#

5 1 2 @P~0, n; 4, 236! 1 P~1, n 2 1; 4, 236!#,

where 4 is the number of points per documents stored in the database ( J 51), and 236 is the number of other points in the document.

Probability P2 is derived as follows:

P2 5 Prob{1 database pt missing ANDat least 3 entries from at least 1 of 3 remaining dist. seq. aremissing}

5 Prob{1 database pt missing} 3Prob{at least 3 entries from at least 1 of 3 dist. seq. are missingu1 dbpt missing}

5 Prob{1 database pt missing} 3[12Prob{up to 2 entries from each of 3 dist. seq. are missingu1 db ptmissing}]

5 Prob{1 database pt missing} 3[1 2 (Prob{up to 2 entries from 1 dist. seq. are missingu1 db ptmissing})3]

5 Prob{1 database pt missing} 3[1 2 (Prob{0 entries from 1 dist. seq. missingu1 db pt missing} 1Prob{1 entries from 1 dist. seq. missingu1 db pt missing} 1Prob{2 entries from 1 dist. seq. missingu1 db pt missing})3]

5 P(1, n 2 1; 4, 236)

31 2 1P~1, 0, n 2 1; 4, 18, 218! 1 P~1, 1, n 2 2; 4, 18, 218!

1 P~1, 2, n 2 3; 4, 18, 218!

P~1, n 2 1; 4, 236!2

3

4,



where 18 is the average number of points in an image sequence.

Using a similar strategy, probability P3 is calculated as

P35 Prob{0 database pt missing ANDat least 3 entries from at least 2 of 4 remaining dist. seq. aremissing}

5 Prob{0 database pt missing} 3Prob{at least 3 entries from at least 2 of 4 dist. seq. are missingu0 dbpt missing}

5 Prob{0 database pt missing} 3[1 2 Prob{up to 2 entries from each of 4 dist. seq. are missingu0 db ptmissing}]

5 Prob{0 database pt missing} 3[1 2 (Prob{up to 2 entries from 1 dist. seq. are missingu0 db ptmissing})4]

5 Prob{0 database pt missing} 3[1 2 (Prob{0 entries from 1 dist. seq. missingu0 db pt missing} 1

Prob{1 entries from 1 dist. seq. missingu0 db pt missing} 1Prob{2 entries from 1 dist. seq. missingu0 db pt missing})4]

5 P(0, n; 4, 236)

31 2 1P~0, 0, n; 4, 18, 218!

1 P~0, 1, n 2 1; 4, 18, 218!

1 P~0, 2, n 2 2; 4, 18, 218!

P~0, n; 4, 236!2

4

4 .

The probability P as a function of n (number of missing points) is shownin Figure 5 together with the experimentally obtained curve.

3.2 Experiments

In order to verify theoretical analysis and utility of the proposed algorithm,intensive testing was carried out using synthetic images. Synthetic imageswere generated based on the appearance of a small number of availablereal samples. Based on the appearance of real samples, Figure 1(a), thefibers were modeled by polynomial segments of the first and second order,i.e., by line segments and circle arcs. Part of the experiments involvedextracting fibers from the background and studying the effects of image-processing techniques on the algorithm’s performance. The background wasmodeled by a long-crested wave model [Longuet-Higgins 1957] with varyingcomplexity to test preprocessing algorithms, in particular, separation offibers from the background. These experiments are described in Vujovic[1992]. In this section we concentrate on images containing fibers only.Effects of background and the most important aspects of preprocessing areaddressed in Section 5.

Experiments with fiber images considered various fiber shapes, lengths,and densities. Fiber length, circle radius, and circle arc were selected to be



uncorrelated pseudorandom variables uniformly distributed with change-able means and standard deviations. A 32-bit random-number generatorwas used in simulation, thus allowing generation of a large number ofsynthetic documents. Examples of generated documents are shown inFigure 6.

In total, the algorithm was tested on over 250,000 images involvingdifferent databases, parameters, and images varying in size from 64 3 64to 512 3 512 pixels. The purpose of the tests was twofold: (1) to evaluatethe ability of the algorithm to recognize a particular fiber pattern belongingto a large database, subject to Distortion 2 only, and (2) to evaluate thepotential of the algorithm to establish the match when fibers are missingfrom a document, i.e., when the test document is subject to combination ofDistortions 1 and 2. Table I summarizes the number of trials and parame-ters used by the experiments. Entry Neighbor Size shows cumulativeeffects of parameters Dxmax, Dymax, and umax described as maximumdisplacement in pixels. All tests were performed on Sun Sparc 5 worksta-tion.

3.2.1 Correct Document Identification. Testing if the algorithm per-forms correctly was conducted by forming a document database and per-forming two types of tests on it:

Fig. 5. Impact of missing fibers on the algorithm’s ability to identify the correct document,assuming 120 fibers per image and one point stored for each quadrant.



—Test 1: Generation of a new document by changing the seed of therandom-number generator and matching this document to the database.

—Test 2: Modification of a document from the database (including rotationand/or translation) and search for its match in the database. The rotationand translation were determined interactively or using the random-number generator, and the rotation was performed using bilinear inter-polation.

Regarding Test 1, the algorithm never matched any database documentwith the test document. Regarding Test 2, in all cases where distortionswere within assumed limits the algorithm has correctly matched the testdocument (this includes rows 1, 2, 4–7 in Table I). Furthermore, thealgorithm has correctly identified documents even in cases when thedistortions were larger than the assumed values.

3.2.2 Effects of Missing Fibers. The objective of these experiments wasto determine the effects of missing fibers on the algorithm’s ability toidentify the correct document. The experiments comprised two parts. In thefirst part, a database of 100,000 documents was extensively tested byrandomly removing up to Nmax fibers. In all cases, which involved more

Fig. 6. Examples of binary synthetic documents; (a) documents generated by changing theseed of the random-number generator; (b) modified documents by translation and rotation.



than 100,000 trials (part of experiments described in row 3 in Table I), thealgorithm performed satisfactorily.

In the second part of the test, we have progressively removed fibersbeyond the assumed limits and recorded the number of documents thatwere correctly identified, documents that were not possible to authenticate,and documents for which wrong matches were established. These experi-ments were conducted on a database containing 10,000 documents, and theresults are as follows. In none of the cases did the algorithm match thealtered test document with a wrong database document, i.e., the algorithmeither identified a document correctly or reported that the test documentcould not be matched. In Figure 5, the rate of documents that were notidentified (solid line) as a function of a number of missing fibers is shown.It can be seen from the figure that the algorithm that is designed to workassuming that a single fiber is missing performs well even when 10 fibersare missing (the correct detection rate is about 97%). In some cases thealgorithm was capable of correctly matching documents even when as manyas 50 (i.e., almost half) of the document fibers were missing.

4. ALGORITHM PERFORMANCE: RETRIEVAL TIME

Considering that databases may potentially be comprised of billions ofdocuments, it is necessary to evaluate the algorithm’s efficiency. In Sec-tions 4.1 and 4.2 we analyze the retrieval time theoretically and experi-mentally. In both cases we consider random database organization. Inpractice, it may be possible to index database entries; however, suchorganization is application dependent and is not considered in this work.

4.1 Theoretical Considerations

One of the major advantages of the proposed algorithm is that only level 1of the authentication procedure considers all documents. Thus, the bulk of

Table I. List of Experiments and Parameters Used

TestNumber of

ExperimentsDatabase

SizeImageSize

NeighborSize

Fibers/Image

Points/Quadrant

MissingFibers

1 3000 100 256 50 200 4 —2 40000 103–105 128 10 120 1 —3 140000 104–105 128 0 120 1 0–514 17000 103–105 128 10 120 1 —5 9000 103–105 128 10 120 1 —6 10000 103–105 128 10 120 1 —7 7000 104 128 varies 120 1 —8 24000 103–104 64 10 30 1 —9 1000 5 512 20 800 8 —

Cumulative effects of translation and rotation are described by the maximum displacementshown in column Neighbor Size. The tests pertain to the following: 1—validation; 2—effects ofmisalignment and missing fibers on average retrieval time; 3—effects of missing fibers;4—effects of number of documents on average retrieval time; 5—effects of rotation on averageretrieval time; 6—effects of translation on average retrieval time; 7—effects of neighborhoodsize on average retrieval time; 8—correlation; 9—real images.



processing required to authenticate a document is done at this level, inparticular, at the first step (the first point match). In order to establish thisfact we first estimate in Section 4.1.1 the probability of a first point matchbeing wrong, and then we analyze the retrieval time in Sections 4.1.2 and4.1.3.

4.1.1 Probability of a Wrong First Point Match. In the following weconsider the probability of establishing a wrong first point match and showthat this is not a common event, thus establishing that the retrieval timedepends primarily on the time required to establish the first point match.Based on the considerations in Section 2.1, the first point match is wrong(in the sense of identifying the wrong document for further comparison) if apoint in the test document (that represents the first or second quadrant)has the same sequence of ordered distances as a point representing adatabase document. Suppose that a database document is represented by apoint P and distances 69 5 {d91, d92, . . . , d9k9}. If a point Q in the testdocument is a match for P, then k distances in sequence 6 representing Qmust match distances in 69, and k # k9. Assuming randomly distributedpoints/fibers over the document, the probability that a point at distance difrom Q can be found in the test document is p1 5 (Area of the ring ofradius di/Area of the document). The width of the ring is specified by thetolerance, which is in our case 1.5 pixels. Therefore,

p1~di! 5@~di 1 1.5!2 2 ~di 2 1.5!2#p

A5

6dip

A,

and its expected value is

p1# 56d# ip

A5

3pR

A,

where di# 5 R/2, and R is the radius of the circle inside which the distances

are computed. The average number na of points in the circle of radius R forgiven number of fibers per document n is

na 5R2p

A2n.

Furthermore, assuming documents represented by J points per quadrantand Nmax fibers are missing, the estimated probability of the wrong initialpoint match is given by

Prob 5 2 3 J 3 J 3 ~p# 1!na22Nmax21.

For example, for images of size 512 3 512, n 5 1000, Nmax 5 7, R 5 60,and J 5 4, we have Prob 5 1.6 3 102188, and for images of size 128 3 128,n 5 120, Nmax 5 1, R 5 20, and J 5 1, we have Prob 5 1.6 3 10229.



4.1.2 Average Retrieval Time to Establish the First Point Match. Toestablish the first point match in a randomly organized database of Ndocdocuments, the authentication procedure considers, on average, n 5 np(Ndoc/ 2) J sequence comparisons. Symbol np denotes the number ofcandidate points in the document under investigation that correspond tothe first quadrant of the database document, and J is the number of points(sequences) stored per document quadrant. Assuming sequential process-ing, the comparison of two sequences is abandoned, on average, after Kaveindividual comparisons, where Kave denotes the average number of ele-ments in stored sequences. Consequently, identification of the first matchrequires

nKave 5 np

Ndoc

2J Kave (3)

integer comparisons. For a document of area A (in pixels) and radius of acircle R, Kave 5 ( A/R2 p).

4.1.3 Factors that Determine Retrieval Time. From Eq. (3) it followsthat the retrieval time linearly increases with the database size. It shouldalso be noted that the image size and fiber density do not play a role indetermining the algorithm’s efficiency.6 Furthermore, it should be notedthat the algorithm’s ability to identify a correct document does not dependon the misalignment parameters, Dxmax, Dymax, and umax (as long asassumed parameters reflect the worst case). However, these parametersdetermine the number of points np considered in establishing the first pointmatch and, thus, determine the processing speed. For example, for umax 56p combined with translation, np 5 2n, where n is the total number offibers per document, i.e., all points in the test document are considered ascandidate points for the first point match, in contrast to cases involvingsmall misalignments where only small portions of the document are exam-ined.

4.2 Experiments

The most extensive testing regarding the algorithm’s efficiency was con-ducted with images of size 128 3 128 pixels with 120 fibers per image anddatabases containing 10,000 and 100,000 documents (rows 2 and 4–7 inTable I). In these experiments, a document quadrant was represented byonly one point, which is the worst case from the authentication point ofview. The reported retrieval time is an overestimation because it involvedadditional processing, including image generation. This was done becausemost of the experiments involved very large databases, and instead of

6The complete authentication requires correlation between two images, and thus theseparameters play a role in the total retrieval time. However, the correlation is carried outusually between only two images and for large databases has little impact on total retrievaltime.



storing images at level 3 we have stored the random-number generatorspecifications; a database document and end point image were generatedwhen required by the authentication algorithm (database at level 1 wasstored in the complete form). The experiments considered unaltered docu-ments and documents altered by translation and rotation. Missing fiberswere not included in these experiments because they do not affect retrievaltime. Distorted images were generated by randomly assigning translationand rotation parameters to images stored in the database. Distortions weresuch that no pixel was deviated from its original position for more than 5%of the image size, i.e., a 10 3 10 neighborhood was searched for the firstpoint match.

Figure 7 shows the average document retrieval time as a function ofdatabase size. Parameters used in the experiment are listed in Table I, row4. It can be seen that document retrieval time linearly increases with theincrease in the number of documents, as theoretically predicted in Section4.1.2. Figure 7 also indicates that the time needed to retrieve a distorteddocument is about two times larger than the time required to retrievenondistorted documents. As an illustration, for a database of 100,000documents the proposed algorithm, on average, retrieves an arbitrarilydistorted document in less than 10 minutes.

Further studies concentrated on the effects of document misalignment onthe retrieval time. Parameters used in this experiment are listed in rows 5

Fig. 7. Average retrieval time as a function of database size without distortion (solid curve)and when the test image was subjected to arbitrary distortions (dashed curve). Parametersused in the experiment are listed in Table I, row 4.



and 6 in Table I. Results of the experiments are summarized in Figure 8(squares for rotation and triangles for translation). As expected, translationhas a significantly larger impact on retrieval time than rotation.

Another set of experiments was designed to determine the effects of thesize of the neighborhood that is searched for the first point match, i.e., themaximum assumed misalignment, on the average retrieval time. Theexperiment was conducted on the database containing 10,000 documents(see Table I, row 7). Documents that were tested were not altered. Theresult of the experiment is shown in Figure 9. As expected, the retrievaltime increases as the neighborhood size increases. It should be pointed out,however, that the algorithm’s ability to correctly identify a document is notaffected by the maximum distortion; only the retrieval time is affected.

In summary, the processing speed of the proposed algorithm does notdepend on image size and fiber density. Assuming random organization ofthe database, the speed is directly proportional to the number of storeddocuments and the assumed number of missing fibers. Consequently, theproposed algorithm has advantages over the correlation techniques whosespeed depends also on the image size. In Figure 10 we compare retrievaltime of the proposed algorithm and standard correlation. These experi-ments were conducted using 64 3 64 images containing 30 fibers perimage, and the neighborhood that was searched for the first point matchwas 10 3 10, assuming maximum displacement of 65 pixels (Table I, row8). The databases contained 1000, 5000, 8000, and 10,000 documents. The

Fig. 8. Average retrieval time as a function of rotation (squares) and translation (triangles).Parameters used in the experiment are listed in Table I, rows 5 and 6.



correlation experiments considered the translation, and the correlationcoefficient was computed for each possible displacement. The graph showsthat correlation requires about three orders of magnitude higher processingtime.

Some of the experiments considered that in addition to Nmax missingfibers a continuous part of a document may be missing (up to 25%). Thisassumption required modifying the algorithm at levels 2 and 3 to allow fora continuous region missing in the test document. The algorithm performedsatisfactorily in all cases in which the distortion was within prespecifiedlimits.

5. PROCESSING REAL IMAGES

Considering that the algorithm requires almost exact matches betweenimage pairs it was necessary to confirm that images of real samples meetsuch strict requirements. In particular, it was necessary to evaluate effectsof noise, blurring, and artifacts, which are normally encountered in prac-tice, on the algorithm’s performance. Considering that the preprocessingbecomes an integral part of the algorithm when using real textures, theobjective of testing was, in part, to establish the ability of the algorithm tocompensate for inadequate preprocessing.

Fig. 9. Average retrieval time as a function of neighborhood size searched for the first pointmatch.



5.1 Sample Characteristics

A limited number of paper samples was available for testing. The sampleswere fabricated specifically for related research; however, the resultingpatterns differ in two important aspects from synthetic images used forinitial testing:

—Varying Visibility of Fibers: This problem occurred due to considerablesample thickness relative to the fiber diameter (which in turn is due tomanual fabrication of samples). The resulting effect was a decrease incontrast of fibers (relative to the background) with the increase of samplethickness (relative to the sensor). The decreasing contrast raised theconcern that it was unlikely that standard image-processing algorithmscan extract the fibers completely and accurately.7

—Significant Variation of Fiber Length: Due to sample thickness the fiberswere not always lying across the surface of the sample; instead, thepositioning of the fibers within a sample varied in 3D. Consequently,

7It is pointed out that acquiring images of both sides of a sample simultaneously can reduceproblems associated with low contrast associated with some fibers. However, we have limitedthe present tests to a single image acquisition.

Fig. 10. Comparison of the authentication algorithms with respect to the average retrievaltime: proposed algorithm (solid curve) and standard correlation—assuming only translation(dashed curve). Parameters used in the experiment are listed in Table I, row 8. The processingtime was computed for database sizes of 1000, 5000, 8000, and 10,000 documents, thuscausing an artifact change of curvature at 5000 documents.



some fibers were only partially visible, including cases where only oneend point was visible. This violated the assumption regarding smallvariance in fiber length. Moreover, fiber length could not be used forguidance in fiber extraction.

A total of 50 real images was available for the study, some of whichcorresponded to the same sample acquired at different times. No specificeffort was made to control acquisition conditions, apart from keepingresolution the same. Images were acquired as 512 3 512 pixel arrays,digitized to 256 gray levels, with an average fiber length and diameter of 35and 5 pixels, respectively.

5.2 Preprocessing Steps

In view of the fact that developing complex preprocessing steps has limitedusage, because changes in sample characteristics or acquisition conditionscall for redesigning preprocessing, the objective of the tests was to utilizestandard image-processing algorithms for preprocessing and attempt relax-ing the requirements of the authentication algorithm. The preprocessingwas limited to the following three steps:

—Step 1 (Suppression of Noise and Uneven Lighting): This step usesmorphological filtering [Gonzalez and Wintz 1987] and replaces a pixelvalue with the minimum gray-level value of its neighborhood. Theneighborhood size is chosen to be larger than the average fiber diameter,thus effectively erasing both the fibers and the noise. Next, the filteredimage is subtracted from the original image, thus removing the effects ofuneven lighting.

—Step 2 (Regional Thresholding): This step extracts fibers by using themodified optimum thresholding algorithm described in Otsu [1979]. Thealgorithm is applied locally to overlapping regions, and the thresholdvalue is automatically adjusted if it assigns a larger-than-expectedpercentage of pixels to fibers. Thresholding turns each of the regions intoa binary image, and the output image is obtained by combining theregional results. The algorithm is applied iteratively, and at each itera-tion it progressively extracts a new group of fibers of lower contrast. Theprocessing automatically stops when no new fibers can be extracted. Thethresholded image is used in two ways. It is used to remove noise in theskeleton image and to match images at level 3 of the authenticationalgorithm. Examples of an original image and the thresholded image areshown in Figures 11(a) and 11(b), respectively.

—Step 3 (Fiber Skeleton Extraction): The objective of this step is to extractthe central point (along the fiber diameter) and to represent each of thefibers by its skeleton. Considering that the fiber diameter is relativelysmall, we have implemented the extraction of the central points as theextraction of the roof edge center point in 1D signals. The gray-levelimage obtained in Step 1 is scanned in the horizontal and verticaldirections, and the central positions of the roof edges are recorded. This



procedure results in the extraction of fiber central points and occasion-ally noise. The noise points are eliminated by applying a logical ANDoperation to the output of this step and the output of Step 2. An exampleof the skeleton image is shown in Figure 11(c).

Fiber end points are determined by template matching, using the skeletonimages (obtained in Step 3 of preprocessing). Binary templates are gener-ated to represent all possible pixel configurations considering neighbor-hoods of specific size. Lack of “ground truth” images did not allow a preciseevaluation of the preprocessing results. In general, all the fibers that werevisually identifiable were extracted; however, some of the very low contrastfibers (visible only under enhancement) were missed or were extracted onlypartially.

Fig. 11. Various stages of document processing; (a) original image (suppressed background);(b) result of regional thresholding; (c) skeletons of extracted objects in (b).



5.3 Algorithm Modifications

Since the preprocessing did not ensure accurate fiber extraction, it wasnecessary to modify the original algorithm. Alternatives included improv-ing the acquisition conditions or the preprocessing algorithms. However,taking into account that it is difficult in general to ensure 100% correctextraction of fibers, and in view of the discussion above, we have opted tomodify the algorithm and check the robustness of the new version. The twomost concerning issues regarding algorithm performance were that Steps 2and 3 generated images that (1) violated the assumption that no new fibersappear in the new image—because it is possible to detect fibers that werenot previously detected—and (2) did not provide accuracy of fiber end pointdetection as did the synthetic images; the main causes of this problem werepoor contrast and considerable fiber width. It should be pointed out thatthe concern regarding accuracy did not materialize and that the neighbor-hood size searched for a match was adequate, as specified in Section 3.Taking the first issue into account, the algorithm was modified as follows:

—The new version does not require that all of the end points (at level 2)and all of the fibers in the test document (at level 3) have a match in thedatabase image. Instead, the requirement is that at least 90% of thefibers in the test image match the fibers in the database image. Thismodification implies that we expect difficulties with the extraction ofabout 10% of the fiber population.

—The procedure for comparing the end point sequences was changed toallow a match to be established even in a case when the sequencerepresenting a point in the test image is longer than the sequencerepresenting a candidate match in the database. This change had aneffect on processing speed, and the sequence comparison for establishingthe first point match took on average more than half of the averagesequence length.

5.4 Results

The tests were done using a database containing five different samples, andthe remaining images were used for testing. The experiments considereddifferent combinations of the stored documents in order to avoid anypossible artifacts due to a small database. The testing comprised thefollowing:

—Test 1: Matching the images that had correspondence in the database.Two types of tests were conducted in this case: (1) matching the imagesacquired at different times (misaligned relative to their representation inthe database) and (2) matching the images acquired at different timesfurther modified by removing parts of documents (no documents withmissing fibers were available). The fiber removal was performed onbinary images, i.e., following preprocessing, and the fibers were removedinteractively using the cursor.



—Test 2: Authenticating documents that did not have correspondence inthe database.

Both sets of tests were conducted with the following selection of parame-ters: J 5 8 ( points) representing each document quadrant; R 5 60( pixels) radius of the circle used to generate distance sequences; Td 5 61modeling digitization effects; k 5 3 neighborhood size for search; 90% offibers in the test document had to match with 90% of fibers in the databaseimage; maximum misalignment umax 5 630; Dxmax 5 20 ( pixels); andDymax 5 20 ( pixels). Regarding missing fibers, it is assumed that a part ofa document may be missing (up to 25% of the whole document), and inaddition up to 15 individual fibers may be missing.

The bulk of experiments, over 900, was conducted in Test 1. In all casesthe images which had correspondence in the database were correctlyidentified. It is pointed out that the images were always acquired using thesame side of the document to ensure visibility of the same fibers. Also, inall cases of image modifications within the prespecified limits the imageretrieval was successful. Since only a limited number of documents wasavailable for this study, the number of experiments involving documentsnot stored in the database was very limited. In all cases, no match wasestablished.

The consideration of different combinations of stored documents did nothave an effect on retrieval time, or number of successful first point matchesbetween different documents. Conclusive evaluation requires a consider-ably larger sample set, and controlled experimentation with acquisitionconditions, including sensor calibration.

6. CONCLUSIONS

The proposed algorithm was designed to establish a match between adistorted test document and its representation in a large database, possiblycontaining billions of documents. Matching is achieved based on a texturepattern obtained by randomly embedding short fibers into the documentmedium during its fabrication. Such random patterns guarantee very highuniqueness of documents and are, therefore, efficient anticounterfeitingmeasures. The algorithm uses a three-level hierarchy to solve simulta-neously the problems of image registration and matching, by taking advan-tage of the fact that distances remain invariant under assumed distortions.It is important to point out that the algorithm is applicable to otherproblems where feature points can be identified, e.g., centers of objectgravity, intersection points, etc. The algorithm may be extended to othertypes of distortions, e.g., scaling, by changing the matching criteria.

The major advantages of the algorithm are

(1) it guarantees a correct match for cases where document distortions arewithin assumed limits,

(2) it performs satisfactorily even in cases when a document is subject tolarger distortions than assumed, and



(3) it is computationally more efficient than standard correlation because itinitially requires comparing only a set of integer numbers, and theretrieval time does not depend on image size and fiber density.

A disadvantage of the algorithm is storage overhead caused by using aseparate database to establish a match at level 1.8 The storage overheadper document is M sequences of Kave integer values. As an example, for R 560, 1000 fibers per document, M 5 4, and a 512 3 512 image, this impliesstorage of approximately 1400 integers per image. It is pointed out that theoverhead is relatively small, and it is possible to keep readily available thewhole or significant parts of the database without requiring significant I/Oactivities.

The experiments have shown that the algorithm has the potential to beapplied in authentication problems. It should be pointed out that for verylarge databases it may be impractical to randomly organize the database,and it may be necessary to index documents to limit the number ofdocuments to be considered and on average minimize the computation time.Type of indexing is problem dependent, and in some cases other documentidentifiers, e.g., credit card numbers, can be used to obtain direct access toa particular document or a group of documents that are a likely match.Development of an indexing procedure that is not affected by missing partsof a document is of particular interest. Using the proposed approach fordocument authentication requires further experimentation with real im-ages. Important theoretical extensions include the relaxation of rigidityrequirements and the inclusion of elastic deformations that are likely toappear in document media, e.g., paper stretching.

APPENDIX

List of Symbols

A area of the documentAi area of the ith fiber patchdi ith distance in a sequence of distancesJ number of points that represent a quadrant of a database

documentK number of detectable (different) digital documentsKave average number of elements in stored sequencesK1 number of possible documentsK2 number of nondetectable documentsl fiber lengthm1 maximum area of a document occupied by fibersM number of points that represent a database documentn number of fibers in a database documentna average number of points in the circle of radius R

8Databases used at levels 2 and 3 may be kept separately or as a single database containingbinary fiber patterns.



np number of candidate points in the test document for the firstpoint match

n1 number of fibers in a test documentN number of different positions for a fiber in a document; image

sizeNdoc number of documents in the databaseNmax maximum number of missing fibers in a documentN1 number of different fibers over an area of m1 pixelsp length of a document in inches; number of entries in which two

matching sequences may differpb probability that a randomly chosen point from an end point

image belongs neither to a database point nor to a distancesequence

pc probability that a randomly chosen point from an end pointimage belongs to a database point

ps probability that a randomly chosen point from an end pointimage belongs to a distance sequence

p1 (di) probability that a point can be found at distance di from adatabase point

P probability that a match cannot be established due to missingfibers

P(x, y) fiber end point located at (x, y)q width of a document in inchesr pixel size in inchesR radius of a circle centered at an end points1 number of straight fibers passing through a fixed central points2 number of fibers of arbitrary shape with one end point fixed6 sequence of ordered distancesTd digitization threshold valuex, y Cartesian coordinatesx0 translation in x directiony0 translation in y directiond fiber densityDxmax maximum translation in x directionDymax maximum translation in y directionn average number of sequence comparisons to establish first point

matchu rotation angleumax maximum rotation

ACKNOWLEDGMENT

The authors thank J. M. Ramsey of Oak Ridge National Laboratory forbringing this problem to their attention and providing images of realsamples.

REFERENCES

BROWN, L. G. 1992. A survey of image registration techniques. ACM Comput. Surv. 24 4.BRZAKOVIC, D. AND VUJOVIC, N. 1996. Authentication of random patterns by finding a match

in an image database. Image Vision Comput. 14, 485–499.CHELLAPPA, R., WILSON, C. L., AND SIROHEY, S. 1995. Human and machine recognition of

faces: A survey. Proc. IEEE, 83, 5, 705–740.



GONZALEZ, R. C. AND WINTZ, P. 1987. Digital Image Processing. Addison-Wesley, Reading,Mass.

GRIMSON, E. L. AND HUTTENLOCHER, D. P. 1990. On the sensitivity of geometric hashing. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE,New York, 334–338.

GUIGO, R. AND SMITH, T. F. 1993. Inferring correlation between database queries: Analysisof protein sequence patterns. IEEE Trans. Patt. Anal. Mach. Intell. 15, 10, 1030–1040.

HENIKOFF, S. AND HENIKOFF, J. G. 1995. Protein family classification method for analysis oflarge DNA sequences. In Proceedings of the 27th Hawaii International Conference on SystemSciences. Vol. 5. IEEE, New York.

KAHL, D. J., ROSENFELD, A., AND DANKER, A. 1980. Some experiments in point patternmatching. IEEE Trans. Syst. Man Cybernet. SMC-10, 2.

LAMDAN, Y. AND WOLFSON, H. J. 1988. Geometric hashing: A general and efficient model-based recognition scheme. In Proceedings of the IEEE International Conference on ComputerVision. IEEE, New York.

LONGUET-HIGGINS, M. S. 1957. Statistical properties of an isotropic random surface. Phil.Trans. Roy. Soc. London A, 250.

MEHTRE, B. M. 1993. Fingerprint image analysis for automatic identification. Mach. VisionAppl. 6, 2–3, 124–139.

OTSU, N. 1979. A threshold selection method from gray-level histograms. IEEE Trans. Syst.Man Cybernet. SMC-9, 1, 62–66.

PEARCE, A., CAELLI, T., AND BISCHOF, W. F. 1994. Rulegraphs for graph matching in patternrecognition. Patt. Recog. 27, 9, 1231–1247.

PENTLAND, A., MOGHADDAM, B., STARNER, T., AND TURK, M. 1994. View-based and modulareigenspaces for face recognition. In Proceedings of the Computer Society Conference onComputer Vision and Pattern Recognition. 84–91.

PETRAKIS, E. G. M. AND ORPHANOUDAKIS, S. C. 1993. Methodology for the representation,indexing and retrieval of images by content. Image Vision Comput. 11, 504–521.

PICARD, R. W. AND KABIR, T. 1993. Finding similar patterns in large image databases. InProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Process-ing. Vol. 5. IEEE, New York.

RANADE, S. AND ROSENFELD, A. 1980. Point pattern matching by relaxation. Pat. Recog. 12,269–275.

SKEA, D. AND BARRODALE, I. 1991. Control point matching study. Contractors Rep. 91-11,Defense Research Establishment Pacific, Victoria, B.C., Canada.

SKEA, D., BARRODALE, I., KUWAHARA, R., AND POECKERT, R. 1993. A control point matchingalgorithm. Pat. Recog. 26, 2, 269–276.

SOGA, M., TOYODA, J., NITTA, Y., IMANAKA, T., AND YANAGIDA, M. 1993. Generation of ahierarchical representation for graphic patterns based on grouping. Syst. Comput. Japan 24,70–87.

THOMOPOULOS, S. C. AND REISMAN, J. G. 1994. Fusion-based, high-volume automated finger-print identification system. Proc. SPIE 2093.

VAIDYA, P. M. 1989. Geometry helps in matching. SIAM J. Comput. 18, 6, 1201–1225.VUJOVIC, N. 1992. Document authentication. M. S. thesis, Univ. of Tennessee, Knoxville,

Tenn.WILSON, R. AND HAKCOCK, E. R. 1993. Matching features in aerial images by relaxation

labeling. In Proceedings of the 7th International Conference on Image Analysis and Process-ing. Vol. 3.

ZHENG, Q. AND CHELLAPPA, R. 1993. Computational vision approach to image registration.IEEE Trans. Image Process. 2, 3, 311–326.

Received March 1996; revised August 1996 and February 1997; accepted February 1997



Evaluation of an Algorithm for Finding a Match of a ...

Documents

Transcript of Evaluation of an Algorithm for Finding a Match of a ...