A system for processing handwritten bank checks automatically

27
– 1 – D r a f t : winbank20020227_Image_and_Vision_Computing.wpd Printed on March 1, 2002 (10:26am) A system for processing handwritten bank checks automatically Rafael Palacios* and Amar Gupta** *Universidad Pontificia Comillas, Madrid, Spain **Massachusetts Institute of Technology, Cambridge, MA, U.S.A. Abstract In the US and many other countries, bank checks are preprinted with the account number and the check number in MICR ink and format; as such, these two numeric fields can be easily read and processed using automated techniques. However, the amount field on a filled-in check is usually read by human eyes, and involves significant time and cost, especially when one considers that about 68 billion checks are processed per annum in the US alone. The system described in this paper uses the scanned image of a bank check to "read" the check. There are four main stages in the system that focus on: the detection of courtesy amount block within the image; the segmentation of string into characters; the recognition of isolated characters; and the postprocessing process that ensures correct recognition. The detection of courtesy amounts is performed using heuristic rules. These rules are applied after processing the image to translate the set of gray pixels into groups of pixels, which are considered as strings. The segmentation of the courtesy amount is the most difficult part of the process because of the largely unconstrained nature of handwritten amounts on checks. The proposed segmentation has been implemented as a recursive process that interacts with the recognition module. The recognition of the isolated characters is based on an array of neural networks that has been demonstrated to be very accurate and computationally efficient. Finally, the postprocessing module is used to minimize the incidence of incorrect readings by verifying that the sequence of numbers, periods and commas matches the correct syntax for the recognized value of the check. Several of the algorithms included in the system are generalizable and can also be applied to address other applications involving reading of handwritten inputs. 1 Introduction The paper check is the most popular form for non-cash payment with 68 billion checks processed in 1999 in the United States alone [67]. Paper checks account for two-thirds of all banking transactions, and the overall number is still growing, despite the rapid growth in payment by credit and debit cards and other electronic means of payment. Since most of the checks need to be partially processed by hand, there is a significant interest in the banking industry for new approaches that can read paper checks automatically. Character recognition systems are of two types: On-line systems, where the sequence in which the characters are written is known; and Off-line systems, where only the final image is available [69]. In most information technology applications, on-line processing requires a greater effort because of time constraints; however, in optical character recognition the most difficult area is off-line reading [3]. The account number and the bank code are printed on the checks in magnetic ink (MICR) and are the only fields that can be processed automatically with near-perfect accuracy. Since the MICR character set is a special type font, these fields can be easily read using magnetic machines or optical (OCR) systems [13]. The other fields may be handwritten, typed, or printed; they contain the name of the recipient, the date, the amount to be paid (textual format), the courtesy amount (numerical format) and the signature of the person who wrote the check. The official value of the check is the amount written in words, while the amount written in numbers is supposed to be for courtesy purposes only and is therefore called "courtesy amount". Nevertheless, employees at the banks usually read only the amount from the courtesy amount field and ignore the other field, altogether. The amount of the check is entered by the employee into the computer, which then prints the amount at the bottom of the check in magnetic ink. The latter printed amount is used in all subsequent operations processing. A system based on emerging optical character recognition techniques and neural network techniques can help to process checks automatically in a fast and less expensive manner. Automatic check processing has been an area of research in image processing for a long time, but it has only been in recent years that complete systems with reading accuracy in the range 20–60% and reading error in the range 1–3%, have begun to be installed [38].

Transcript of A system for processing handwritten bank checks automatically

– 1 –

D r a f t :winbank20020227_Image_and_Vision_Computing.wpd

Printed on March 1, 2002 (10:26am)

A system for processing handwritten bankchecks automaticallyRafael Palacios* and Amar Gupta**

*Universidad Pontificia Comillas, Madrid, Spain

**Massachusetts Institute of Technology, Cambridge, MA, U.S.A.

AbstractIn the US and many other countries, bank checks are preprinted with the account number and the check number in MICRink and format; as such, these two numeric fields can be easily read and processed using automated techniques. However,the amount field on a filled-in check is usually read by human eyes, and involves significant time and cost, especially whenone considers that about 68 billion checks are processed per annum in the US alone.

The system described in this paper uses the scanned image of a bank check to "read" the check. There are four main stagesin the system that focus on: the detection of courtesy amount block within the image; the segmentation of string intocharacters; the recognition of isolated characters; and the postprocessing process that ensures correct recognition.

The detection of courtesy amounts is performed using heuristic rules. These rules are applied after processing the imageto translate the set of gray pixels into groups of pixels, which are considered as strings. The segmentation of the courtesyamount is the most difficult part of the process because of the largely unconstrained nature of handwritten amounts onchecks. The proposed segmentation has been implemented as a recursive process that interacts with the recognition module.The recognition of the isolated characters is based on an array of neural networks that has been demonstrated to be veryaccurate and computationally efficient. Finally, the postprocessing module is used to minimize the incidence of incorrectreadings by verifying that the sequence of numbers, periods and commas matches the correct syntax for the recognized valueof the check. Several of the algorithms included in the system are generalizable and can also be applied to address otherapplications involving reading of handwritten inputs.

1 Introduction

The paper check is the most popular form for non-cash payment with 68 billion checks processed in 1999 in the United Statesalone [67]. Paper checks account for two-thirds of all banking transactions, and the overall number is still growing, despitethe rapid growth in payment by credit and debit cards and other electronic means of payment. Since most of the checks needto be partially processed by hand, there is a significant interest in the banking industry for new approaches that can readpaper checks automatically.

Character recognition systems are of two types: On-line systems, where the sequence in which the characters are writtenis known; and Off-line systems, where only the final image is available [69]. In most information technology applications,on-line processing requires a greater effort because of time constraints; however, in optical character recognition the mostdifficult area is off-line reading [3].

The account number and the bank code are printed on the checks in magnetic ink (MICR) and are the only fields that canbe processed automatically with near-perfect accuracy. Since the MICR character set is a special type font, these fields canbe easily read using magnetic machines or optical (OCR) systems [13]. The other fields may be handwritten, typed, orprinted; they contain the name of the recipient, the date, the amount to be paid (textual format), the courtesy amount(numerical format) and the signature of the person who wrote the check.

The official value of the check is the amount written in words, while the amount written in numbers is supposed to be forcourtesy purposes only and is therefore called "courtesy amount". Nevertheless, employees at the banks usually read onlythe amount from the courtesy amount field and ignore the other field, altogether. The amount of the check is entered by theemployee into the computer, which then prints the amount at the bottom of the check in magnetic ink. The latter printedamount is used in all subsequent operations processing.

A system based on emerging optical character recognition techniques and neural network techniques can help to processchecks automatically in a fast and less expensive manner. Automatic check processing has been an area of research in imageprocessing for a long time, but it has only been in recent years that complete systems with reading accuracy in the range20–60% and reading error in the range 1–3%, have begun to be installed [38].

– 2 –

Binarization of the image

Detection of the courtesy amount

Location of the courtesy amount string

Segmentation

Hybrid Drop Fall

Extended Drop Fall

Min-Max contour analysis

Recognition

Neural Networks

Post-Processing

Structural Checking

Syntactic verificationSegment Normalization

Figure 1: Description of key steps

The system described in this paper has been applied to read checks from Brazilian and U.S. banks. It performs all the tasksnecessary to translate the image of the check coming from an optical scanner into a format that can be readily processed bythe central computer of the bank. The system locates and reads the courtesy amount which is the main field that banks useto process the check. Other researchers have also described or implemented systems to read courtesy amount in checks [31,36], and some of these systems are geared to a particular writing language; for example [35] has been developed for Koreanchecks, [43, 47] for checks written in French, and [1, 34] for U.S. checks. Further, some check processing systems focus onreading the legal amount [30]; see [2] for Brazilian checks, [41] for English language, and [43, 26, 27] for French and English.Also the date is checked in the system at Concordia University (Canada) [20, 64]. Finally, a japanese system for automaticverification of bank checks is based on the extraction and recognition of the seal imprints [66]. This illustrates the broad,almost universal, interest in the area of automatic reading of bank checks.

In general, forms intended to be read by document understanding systems (DUS) are designed to include identification codesand position marks in them [9]. Most of these forms require the writer to use black pen only; this is not the case, however,with the checks processed in the US. Individuals can use pens with inks of any color. Often the checks contain guidelinesor boxes to specify the location where a particular piece of information should be recorded. Also there may be restrictionson the size of the strings, both in terms of height and length. A more limiting constraint is to preprint forms with individualboxes for each character; this restriction minimizes the incidence of connections among adjacent characters. Beforeinformation on documents can be "read", several preprocessing steps (binarization, noise reduction and line removal[62, 25, 14]) can be applied. After such preprocessing, the characters can be segmented and recognized more easily. Whilethe recognition of isolated characters is not very difficult, segmenting string of numbers into individual digits is a formidabletask. Segmentation is easy when one imposes the type of restrictions described above; however, the banks have beenreluctant to adopt such restrictive measures in paper checks. Checks present the full challenge of totally unconstrainedwriting because they have not been designed for processing by document understanding systems.

Our approach for automated check reading is summarized in Figure 1.The first step in the process is to detect the courtesyamount within the image of the check. This involves, a conversion from the gray scale image into a binary data format, usingthe binarization module described in Section 2. Then, several algorithms are applied to accurately select the area of theimage that corresponds to the courtesy amount field (Section 3). The most challenging part of the process is the segmentationprocess (Section 4) which involves dissecting the courtesy amount field into individual characters. The latter task isperformed using a feedback mechanism that helps to determine if the sets of segments, produced by different dividingmethods, are correctly recognized. The recognition module (Section 5) uses a combination of neural networks and structuraltechniques to classify digits with very high levels of confidence. The final post-processing module is described in Section 6;it verifies the syntax of the amount to minimize the instances involving incorrect readings.

2 Binarization of image

The digital information produced by black and white scanners is maintained within the computer as a matrix where eachvalue represents the brightness of the corresponding point in the page. The size of the matrix depends on the size of thedocument and the resolution of the scanner. The latter is defined in terms of number of dots per inch (dpi). The values aregenerally in the range of 0 to 255, where 0 is the darkest shade of black, 255 is the brightest shade of white and intermediatevalues represent different levels of gray (0 to 255 is the scale for 8 bit depth; also called color resolution). So a bank checkwith dimensions of 2.75x6 inches, scanned at a resolution of 300 dpi, will generate a 825x1800 matrix of values that

– 3 –

Figure 2: Gray scale image of the check

collectively require nearly 1.5 MB of memory space. In order to reduce the storage space and the corresponding processingtimes, the pixels are frequently encoded as "black" or "white"; all intermediate gray scales are mapped to one of these twoextreme values.

Most of the information contained in the image of a scanned check is associated with the tones of the background drawingand provides negligible help for reading the check. Therefore, the first stage in the recognition process is to distinguishwhich pixels constitute the background and which ones represent useful textual elements. The system developed at MIT(as well as most other systems) associates a value of zero to the pixels representing the background and a value of one to thepixels representing characters, either handwritten or machine-printed. In the check shown in Figure 2, the backgrounddrawing is lighter than the character strings. As such, by determining a threshold in the luminance value, one can easilyestablish the conversion of the raw inputs into a series of zeros and ones. This conversion process is called binarization,because the result is represented entirely in binary format.

Figure 3 represents the histogram of the image on Figure 2. It shows the number of pixels in the image for every level (from0 to 255) of brightness. A large number of pixels have high values, which correspond to the background, and a smaller setof pixels have darker values, which correspond to the written part. A value between the two peaks (labeled text andbackground in the image) divides the histogram into background color (to the right) and text color (to the left).

– 4 –

Figure 3: Histogram of Image of Check

Defining as zero the pixels with values over 160 (in a scale from 0 to 255), and as one for pixels with values under thisthreshold produces the image shown in Figure 4, where the zeros are drawn in white and ones are shown in black. In thisfigure, the superfluous information related to the background has been removed and the data related to the characters havebeen simplified into a set of black pixels. For the purpose of optical character recognition, the variations in the tone in thestrokes are not significant. As such, this simplification facilitates the recognition process.

– 5 –

Figure 4: Image of check converted into black and white pixels

A filter could be applied to remove all the isolated points of the image; a good filter for this purpose is the Median filter [48].However, filtration is not necessary if the procedure used to identify the courtesy amount is based on connected components,which eliminates this type of noise as well.

3 Detection of courtesy amount

Bank checks do not conform to a single global standard. Even within the same country, checks come in a variety of sizes andbackground colors. In the United States, checks come in several sizes, and each bank offers its own collection of backgrounddrawings. On an international basis, the courtesy amounts are not located in the same place on the check, and cannot belocated in terms of coordinates alone; no special box or icon is provided to help locate the courtesy amount. A number ofcountries require that the courtesy amount field appear on the right half of the check. However, there are exceptions to thisrule. So a system intended to handle international checks cannot benefit from this requirement.

One of the approaches developed by our team for identifying the courtesy amount string involves three stages: organizingthe information in blocks of connected components; identifying potential string candidate; and formulating the decision ofwhich string represents the amount of the check. Figure 5 shows the successive steps in this process, from the connectedcomponents, to the groups of components, and finally to the boxes around strings drawn over the original binarized imageof the check. The concept of minimum bounding rectangle (MBR) is utilized as described in the following paragraphs.

– 6 –

Figure 5: Connected components, groups of components and strings

3.1 Organize Information into Component Blocks

There are two approaches available to extract characters from an image: bottom-up and top-down. In the bottom-up method[62, 61, 21], characters are first located. These characters are then grouped together to form words and subsequently linesof text. The search for characters is based on looking for pixels connected with neighboring pixels. The method suggestedby Srihari [61] focuses on the strokes. It first estimates the average thickness of the strokes in the document and then buildsvertical strokes of that width by grouping pieces together. After getting the vertical strokes, the characters are obtained byadding the pixels connected to the vertical strokes. Other common bottom-up methods start by directly identifying theconnected components in the image [62, 21]. The connections can be established by looking at the eight nearest positionsof one pixel, or alternatively by looking just at the four adjacent positions in the vertical and horizontal directions but notalong the two diagonals.

– 7 –

Figure 6: Adjacent pixels considered as connected components

Figure 7: Connected components

The connected components, which represent single characters or a bunch of characters, can be represented by their minimumbounding rectangles (MBRs). The MBRs are the smallest rectangles that contain all the pixels of one component, and aredescribed in terms of two pairs of coordinates. MBRs can help to identify if a connected component is a graphic representationor a form line, instead of being a character. MBRs also simplify the process of grouping characters into strings. Fletcherand Kasturi [21] accept or reject each component as a character on the basis of its size, black pixel density, aspect ratio, area,and position in the image. Bounding rectangles of similar size, which are horizontally adjacent to each other and closeenough, are considered to belong to the same word [62]. One can also apply the Hough transform [25, 19] to the centroidsof the characters to determine which of them are collinear.

In the top-down method, the system tries to determine the overall structure of a page, then decomposes the page into regions,lines, words and finally characters. This method is suitable for document pages that consist primarily of dense lines of text,employing only a few type sizes. As such, the previous method (bottom-up) is more suitable to our application and was usedto locate the courtesy amount in the check. First, the connected components are extracted from the image and then theresulting MBRs are grouped together to generate strings.

Textual characters consist of multiple connected components. While components may include more than one character, thisoccurs primarily in handwritten text. The method chosen to extract connected components from the image of the checks wasa depth first search (DFS) [16]. To increase the speed of the system, this algorithm was implemented using a runlengthrepresentation of the image [32]. Figure 6 shows four examples of sets of pixels that are considered to be connectedcomponents.

In most of the cases, the results produced by this algorithm comprise of blocks of individual characters. But they can alsobe pieces of characters, groups of joined characters, graphics or lines, or groups of isolated characters that are touching thesame reference line preprinted on the check. Figure 7 shows examples of connected components and the correspondingminimum bounding rectangles. Most of the blocks which are very big or have an aspect ratio (width/height) greater than15 can be eliminated. This process removes most of the graphic elements, lines or text connected to lines, which are not partof the courtesy amount.

3.2 Extraction of strings

Characters can be grouped into strings using any of the criteria for determining adjacency discussed above. The importantobjective is that the digits and symbols within the courtesy amount can be grouped together into the same string, and thatthis string must not contain any additional character or erroneous material.

The method used for building strings is based on horizontal collinearity and proximity. Both characteristics can be obtainedefficiently from the coordinates of the MBRs.

– 8 –

thr avg hx= ⋅ +2

3(1)

Figure 8: Number 9 clearly overlaps number 5

Horizontal collinearity indicates if two characters have been written in the same line of text. It can be measured looking atthe horizontal projection of one character over the other. The basic criterion is whether any part of one character projectsonto any part of the other. A more reliable criterion is to determine whether the centroid of one character projects onto somepart of the other; this criterion may need to be checked in both directions because it is not symmetrical. The most restrictivecriterion is to have both centroids within a maximum vertical distance. In order to locate the courtesy amount of checks,the second criterion (involving centroids) was selected, after taking into account that small symbols like comma or periodneed to be recognized properly in order to facilitate the post-processing step. Since these small symbols are written at thebaseline, the collinearity property is especially difficult to establish here.

Proximity helps to determine if two characters are part of the same string or if they belong to different words. The distancebetween two characters is measured from the coordinates of the respective MBRs, and then compared with a threshold. Aglobal distance threshold was not used because smaller typefaces need to be closer together in order to be considered partof the characters. The dynamic threshold (thr) to establish the proximity measure is calculated as follows:

where avg is the mean height of all characters in the image, and h is the height of the character under analysis.

Using this expression we avoid the problems of considering small characters, like a decimal period, as a different string. Afterthe grouping process, all small isolated MBRs are erased because they are considered to be noise, and all other connectedgroups are converted into new objects called strings.

3.3 Identification of Courtesy Amount String

The process of identifying the courtesy amount string is performed in two steps: the first one involves distinguishing betweenhandwritten text and machine-printed text; and the second involves identifying the courtesy amount block, irrespective ofwhether the amount has been handwritten or machine-printed.

In order to guide the system properly in these tasks, a search context was created in the form of a set of heuristic rules. Onlya subset of these rules usually suffices to identify the correct block. But it was decided to establish these rules as tests thateach string must pass or fail. The probability that a particular string is indeed the courtesy amount field depends on thenumber of tests passed. If the likelihood obtained is not high enough, the system will transfer the check for analysis byhuman operators in order to minimize the number of errors. In the banking environment, it is better to have read most ofthe checks with few mistakes than to have read all checks with a higher error rate.

The rules used to identify if the text is handwritten are as follows:

(i) If adjacent connected components overlap, then the block is handwritten. In general, machine-printed text consists ofcharacters separated by vertical lines of white pixels. In unconstrained handwritten numeral strings, however, the rightend of a character may appear to be in the same vertical column as the left end of the next character, as shown in Figure 8.

– 9 –

Figure 9: Printed text without kerning (no overlap), with kerning and in italics.

This rule is not true when the text is printed using the kerning information of the font or when the text is printed in italics(see Figure 9). Kerning only impacts textual characters, but italics or large paper skew will impact numerical characters aswell.

(ii) More regularity implies machine-printed; less regularity generally implies that characters are handwritten.

This regularity can be measured in terms of: the linearity of the bottom-most points of all the characters in the line; theconstancy of the width and the height of characters; and the constancy of the mean pixel density. In the case of printedwords, only the last measurement, which is difficult to evaluate, is true. But in the case of printed numbers, all of thesuggested measurements are true, specially the ones related to constant width and height of characters.

The rules created to locate the courtesy amount string are as follows:

(i) The courtesy amount should have at least two characters.

(ii) A box may surround the courtesy amount.

(iii) A decimal point may appear in the courtesy amount.

(iv) Two digits exist after the decimal point in the courtesy amount. (This is true for U.S., U.K. and many other countries.This condition can be modified when dealing with countries that utilize other specifications).

(v) Some digits may be smaller than others.

(vi) The courtesy amount is located in a fixed position for a particular set of checks (see comment of (iv) above).

These rules can be used to locate the courtesy amount string in any category, handwritten or machine-printed. Sincenowadays most checks are handwritten, an important performance improvement is attained when the strings are pre-classified into a category. Then, location rules are only applied to handwritten strings.

3.4 Location of courtesy amount on Brazilian checks

The above procedure can be applied to locate the courtesy amount on virtually any kind of check. It has been used in theversion of our prototype system that is tailored for U.S. checks. In the case of checks from Brazil, the courtesy amount isalways located at the upper right corner and is delimited by vertical and/or horizontal lines. Several combinations of linesmay be used to delimit the courtesy amount area, as shown in Figure 10. As such, one can use a simplified version of thealgorithm described in the previous subsections.

– 10 –

Figure 10: Different styles of lines surrounding the courtesy amount

In the current version, the location of the courtesy amount is accomplished by searching for horizontal and vertical lines inthe upper right part of the image. Based on the relative position of these lines, the area of interest is then defined as arectangle embedded within the lines. Since the size and the aspect ratio of this rectangle are fairly constant, the amountfield can be accurately determined, even if the area is not delimited by multiple lines.

4 Segmentation of Courtesy Amount

Various methods and strategies for segmentation are discussed in [12]. Since the courtesy amount field may be written indifferent formats, sizes, and styles, (as shown in Figure 11), the segmentation process is complicated. This may involve theseparation of touching characters, and the merging of character fragments with other pieces. These are difficult tasksspecially because multiple ways exist to combine fragments of characters into individual characters. This problem exists inother languages, sometimes even in printed text; the segmentation of Arabic characters is analyzed in [10].

Instead of trying to segment the word into isolated letters,some researchers have attempted to recognize complete words [40].The latter approach can only be applied in situations that involve a predetermined set of valid strings of characters. Moreoften, the images are divided into digits using structure-based techniques [60, 15, 18] and then recognized as individualcharacters; this approach is called segment-then-recognize.

A different approach is the segment-by-recognition [49, 50, 37] that works by applying the recognition algorithm within asliding window that is moved along the text line. If the window covers portions of two consecutive characters, then therecognition module produces the result "unknown". However, when the sliding window takes the position and the size tomatch a complete character, then recognition occurs. The coordinates and the size of the segments are obtained from theposition of the window. Since these systems make frequent calls to the recognition module, they need to utilize very fast andvery accurate algorithms. Another drawback is that slanted text offers lower accuracy, as a consequence of using rectangularwindows.

The software used by postal services to recognize zip codes exploits the fact that a zip code consists of exactly five digits; tencharacters (including the dash) in the case of extended US zip codes [15]. Since dollar amounts are strings of variable length,the algorithms used to recognize zip codes cannot be applied directly to bank checks. Nevertheless, there are somerestrictions on the number of characters between commas and periods. The latter type of domain knowledge has been

– 11 –

Figure 11: Possible styles for writing courtesy amounts.

utilized to reduce the number of read errors within the Post Processing module (discussed later in this paper) and not as partof the segmentation strategy.

With respect to U.S. checks, a finite state automaton was proposed in [1] and [34] to segment and analyze the variety ofstyles found in Figure 11. But in the case of Brazilian checks, the decimal part of the amount is rarely found in a style otherthan scientific format. A major characteristic of Brazilian checks is the extensive use of delimiters as suffixes or prefixesof the amounts (some examples are shown in Figure 10).

4.1 Feedback Strategy for Segmentation

The segmentation algorithm implemented by our team makes the best guess to generate a sequence of digits and symbols,and then receives feedback from the recognition module to readjust the splitting functions if necessary. The segmentationis assumed to be correct if all the individual digits are recognized with adequate confidence. The system begins thesegmentation process by making a few obvious separation of characters, so the digits which are not recognized properly areconsidered to be connected numbers. Rejected blocks are split and recognized again in a feedback loop until a solution isfound with all the segments recognized. (The general scheme of the system was shown earlier in Figure 1).

The primitives obtained as the result of segmentation are classified as digit, fragment, multiple, punctuation, nonsense orhorizontal, according to the size, aspect ratio, weight (number of pixels) and relative position [23]. Then the fragments aremerged with digits or other fragments into new segments before going into the recognition stage.

This approach is similar to the strategy proposed by Congedo [15] and Dimauro [18] that also alternates betweensegmentation and recognition; however their systems lack the ability to correct fragmented digits.

4.2 Dividing Blocks

Using the string selected by the courtesy amount location module as its input, the segmentation module identifies theisolated blocks and applies contour splitting algorithms [6, 12] to find possible paths to separate touching characters. Severalsuccessive algorithms are used for this task: a Min-Max contour analysis, based on the one described by Blumenstain et al[7], then the Hybrid Drop Fall (HDF) [39, 17] algorithm, and finally the Extended Drop Fall algorithm (EDF) [57].

Drop fall algorithms simulate the path produced by a drop of acid falling from above the character and sliding downwardalong the contour. When the drop gets stuck, it "melts" the character's line and then continues to fall.

– 12 –

Figure 12: Movement rules for Drop Fall algorithms

Figure 13: Different results of the drop fall algorithm depending on the starting point and direction.

The dividing path found by a Drop Fall method depends on three aspects: a starting point, movement rules, and orientation.If the starting point is selected far from the middle line between the two characters, then the drop will fall around theexterior contour of one of the digits cutting out a little piece of just one of the digits or not cutting anything at all. Themovement rules can be represented in pseudo code as follows:

Rule 1: if down is white, then move down

The fluid falls. This is the normal movement through the background.

Rule 2: if down-right is white, then move down-right

Diagonal move to the right. This is the movement around the contour of one character, leaving it to the left.

Rule 3: if down-left is white, then move down-left

Diagonal move to the left. If the diagonal movement to the right was not possible, then the fluid tries a diagonalmovement to the left.

Rule 4: if right is white, then move right

Horizontal move before cut. It the fluid reaches a flat area, it tries to reach the contour by horizontal move beforebeginning to divide the character.

Rule 5: if left is white and the drop does not come from the left, then move left

In a flat area the right move is tried first. If it is not possible, then horizontal left movement is performed.

Rule 6: else move down //cut

The six situations in which these conditions apply are shown in Figure 12. The additional expression on Condition 5 protectsagainst infinite back-and-forth movement when dealing with flat contours.

There are four possible orientations that generally produce four different paths to divide the symbol into two characters.They can start on the left or right side and can evolve downwards or upwards (see Figure 13). One of the four is likely toproduce the right result.

The movement rules are different depending on the corner where the algorithm begins, but all four possible paths are easilyobtained applying the basic set of rules over the original image and the flipped image around its vertical and horizontal axes.

– 13 –

Figure 14: Best results when segmenting double zeros: (a) with Hybrid Drop Fall algorithm, (b) with Extended Drop Fallalgorithm.

A more sophisticated algorithm for splitting characters, called Extended Drop Fall (EDF), has been incorporated in recentversions of the check processing system prototype. It is a variant of the Hybrid Drop Fall (HDF) algorithm proposed by Khan[39]. The new variant improves the segmentation of connected zeros which were previously segmented incorrectly. Pairsof zero digits are frequently written for round values such as $100 but mainly in the decimal part of the amount. A randomsampling of over 400 checks showed that about 50% of all connected digits consisted of a pair of zeros and 80-85% ofsuccessive zeros were connected together [57].

In general, Drop Fall algorithms tend to produce short cuts in vertical direction, so dividing pairs of slanted zeros leads toa zero and a partial loop of the other zero. If the loop is open to the left, it can be read as a '7' or '2', and if it is open to theright it can be read as a '6', although the confidence level will not be high in either case. Therefore, the common string "00"is frequently read as "07", "02" or "60" if divided using drop fall algorithms. To divide connected zeros properly, speciallythose slanted at an angle, it is necessary to use long and tilted paths along the junction.

The only difference between the two Drop Fall algorithms is their behavior while cutting through a series of black pixels.The movement rules are the same while the drop moves through white pixels, but instead of moving downward (Rules 1 and6) or try to escape out of the stroke (Rules 2 and 3), the movement in the case of the Extended Drop Fall algorithm iscontrolled by the following set of rules:

if down is black, then move down //cut

if down-right is black, then move down-right //remain on black

if down-left is black, then move down-left //remain on black

These three rules are used to guide the movement of the drop as it cuts the concatenated set of characters. They performthe same function as Rules 1, 2 and 3 described in the previous section. These rules do not allow horizontal movement butthey generate long tilt paths in connected zeros because they extend cuts diagonally through a contour. So the drop followsthe contour until it needs to enter the stroke (black) area. Then it goes down cutting the character but it will follow thecontour from the inside before getting out.

The implementation of the EDF algorithm yielded very accurate results when dealing with blocks of connected zeros, wherelarge cuts are usually needed (Figure 14). The correct segmentation of connected zeros was achieved by the EDF algorithmin twice the number of cases as compared to an equivalent situation relying only on the HDF algorithm [57].

– 14 –

4.3 Selection of best dividing path

The application of the above algorithm leads to a set of 9 possible paths to divide a connected component into individualdigits: one from the contour analysis, four from HDF, and four from EDF. The second part of the split process involvesdeciding which path is the correct one. This determination can be made using a neural network approach as proposed in[42]. An alternative is to undertake an exhaustive search and to select the path on the basis of the confidence levels providedby the recognition module [46]. For performance reasons, the current system avoids unnecessary recognition attempts asmuch as possible; therefore, it uses a heuristic approach to choose a path and then makes good use of the information comingfrom the recognition module in a feedback scheme.

Paths are ranked heuristically using structural features, taking into account the kind of algorithm used to generate the path.The most important criterion is the number of cuts made. In most cases, only one cut is necessary; for example, just one longcut divides connected zeros better than two cuts. Additionally, one considers the type of junction (concave/convex corner),and the length of the cut. Better evaluation is obtained if the cut is obtained on the same side (upper or lower) where thedrop fall algorithms began, because top-left and top-right orientations tend to make erroneous cuts in the lower half of theimage while lower-left and lower-right tend to make errors in the top half component.

The segmentation module interacts with the recognition module to identify both digits that may comprise the particularsegment. The segmentation is assumed to be correct if both digits are recognized correctly. If any character is not recognizedas a digit with a high level of confidence, then alternative paths are tested. In the event that only one of the two parts isrecognized, then the other part is assumed to be comprised of connected characters and the segmentation process is repeatedrecursively for that part until a complete solution is found.

Merging of pieces of characters is attempted only when segments are identified as "fragments of character". In this case,the fragment is merged with the neighboring segment, taking into account relevant criteria such as proximity and overlap.The vertical overlap of segments takes precedence over proximity because the former yields better results. One example isthe number '5' with a disjoined top stroke. This stroke overlaps the rest of number '5' and it could be located very close tothe following digit. Nevertheless, if the digits are not recognized, a different merging option will be attempted in the nextiteration of the feedback loop.

During the recognition of the courtesy amount, the system makes multiple attempts at merging and separation, andfrequently it has to undo some of them to explore alternative segmentation scenarios.

5 Recognition of Individual Characters

The recognition module incorporates neural networks techniques. In order to use a small and efficient neural networkalgorithm, a set of preprocessing algorithms is applied to the character images prior to the actual recognition process. Thepreprocessing of isolated character images involves slant correction, size normalization and thickness normalization. Asthese transformations are not linear, if the algorithms are applied in a different order, they will yield a different result.Figure 15 shows two normalization examples. In the first one, slant normalization is followed by size normalization and thenthickness normalization, while in the second case, the size normalization is not performed until the end. The quality looksslightly better in the second case of this example, nevertheless the first approach was selected for its speed because thethickness normalization algorithm performs much faster on small images. After testing of the algorithms in Matlab, it wasdetermined the process of thickness normalization takes over 90% of the total time for all normalization. Accordingly, thesecond normalization example was found to require double the computing time as compared to the first.

The normalization algorithms are explained in detail in the following sub-sections.

– 15 –

Slant

Slant

Size Thickness

SizeThickness

Figure 15: Two different ways of digit normalization: (slant+size+thick) and (slant+thick+size)

(2)

5.1 Slant Correction

This preprocessing is applied to obtain the same kind of image for the same set of numbers irrespective of the personalwriting tilt. Most recognition methods are tolerant to minor variations from the training set; however their accuracy isdegraded by the presence of slant. The algorithm used is based on the idea that if a numeral is rotated through a series ofslanted positions, it usually attains its minimum width when it is least slanted.

Considering " to be the angle that one wants to rotate the digit in the anti-clockwise direction (and – " clockwise), theequation to transform a bit-map by that angle is [8]:

No pixel is moved to a different row, and all the pixels within the same row are moved by the same distance. This meansthat every row is displaced to the right or to the left (depending on the sign of ") by an amount that increases linearly withthe height of the row over the baseline.

The algorithm to obtain the least slanted position uses a binary search strategy to minimize the width. The algorithm beginswith an initial value for " and tries positive and negative slants, until it finds the image with the minimum width. Whenthe same solution is repeated, the angle is divided by two in each iteration, until a precision of 1 degree is reached.

Based on extensive tests, the algorithm has been limited to a maximum correction angle of 45 degrees in any direction. Thiseliminates potential errors like connected zeros being read as heavily slanted eights. The imposition of angle limit hasincreased the accuracy slightly and improved the overall performance by 6.1% [57].

5.2 Size Normalization

In general, smaller images of the characters perform faster in the neural network. The images of each digit are rescaled toa standard size of 16x16 pixels, and these 256 values serve as the inputs to the neural network. The process of scaling tofit in a predefined size may change the aspect ratio of the image as the scale factors in the horizontal and vertical directionsmay be different. The resulting character exactly fills the area it is supposed to scale into.

Size normalization usually involves a reduction of the digit size, since check images are usually scanned at a resolution of300 dpi that yields digits of about 80 pixels in height. In the case of very low resolution images, the enlargement algorithmsamples the image and stores the values in the normalized matrix, multiplying pixels when necessary.

For size reduction, our algorithm weights the color of the set of pixels in the original image that map over the same pixel inthe 16x16 matrix. In this process, it is not critical if the resulting digits look thinner or thicker than the original, as themodule described in the following section takes care of this problem.

– 16 –

5.3 Thickness Normalization

Thickness normalization is performed in two steps. First, a thinning algorithm is applied to reduce the strokes to a thicknessof one pixel, or sometimes a few pixels. Then, a uniform rethickening process is performed to obtain thickness of severalpixels.

The thinning process involves the transformation of the raw black and white image into a line drawing of unit thickness bydeleting redundant pixels without damaging the appearance of the character. This process, also called skeletonization, mustpreserve the basic structure and connectivity of the original pattern. Algorithms proposed in the literature to solve thisproblem can be classified into two main categories: sequential and parallel [44]. In the serial method, the value of a pixelat the n iteration depends on pixels from the n iteration, and usually also on pixels from earlier iterations. In the parallelmethod, each pixel is obtained as a function of values from the previous iteration only (so the computation can be performedin parallel for every pixel of the image). The algorithm used by us is a parallel method based on the algorithm described in[68]. As evaluated in [45], this algorithm offers one of the best results in terms of performance and accuracy, as comparedto nine other thinning algorithms. Since that evaluation, the algorithm was improved even further by eliminating a numberof time consuming steps [53]. Additional improvements on this algorithm were attainted by Carrasco and Forcada in [11]to obtain more elegant skeletons.

Definitions:

• The neighbors of point p are the 8 pixels that surround p, and they are numbered from 0 to 7 in the clockwisedirection beginning by the point which is on the top of p.

• One pixel is considered to be a contour point if at least one of its neighbors is white.

• A contour loop is defined as a set of contour points which are connected into a loop.

The algorithm is applied sequentially eliminating points of one contour loop at a time. In each iteration, not all contourpoints are deleted but only those that satisfy the following conditions

For the first and odd iterations, the contour point p is eliminated if:

Condition 1) B(p) > 1, and B(p) < 7

Condition 2) A(p)=1, or C(p)=1

Condition 3) E(p)=0

For the second and even iterations, the contour point p is eliminated if:

Condition 1) B(p) > 1, and B(p) < 7

Condition 2) A(p)=1, or D(p)=1

Condition 3) F(p)=0

where:

B(p) is the number of white neighbors of p,

A(p) is the number of white-to-black transitions of the neighbors of p in the clockwise direction.

C(p) is equal to one only if any of the following conditions is true:

p(0)=p(1)=p(2)=p(5)=0 and p(4)=p(6)=1

p(2)=p(3)=p(4)=p(7)=0 and p(6)=p(0)=1;

D(p) is equal to one only if any of the following conditions is true:

p(1)=p(4)=p(5)=p(6)=0 and p(0)=p(2)=1;

p(0)=p(3)=p(6)=p(7)=0 and p(2)=p(4)=1;

E(p)=(p(2)+p(4))*p(0)*p(6);

F(p)=(p(0)+p(6))*p(2)*p(4);

– 17 –

The best performance is obtained by rastering all the black pixel in the image and checking if they belong to the contour,instead of walking along the contour. The number of operations performed is smaller and the algorithm is converted intoa parallel method.

After scaling the input pattern down to one pixel thickness, it is necessary to re-thicken the skeleton uniformly to obtain astandard thickness. Therefore the final thickness of the digit is independent of the type of pen used, and at the same timethe character appears clear of noise at the edges.

The re-thickening algorithm simply sets to 1 (black) all of the neighbor points of every pixel in the skeleton. Usually thereare 6 white pixels surrounding every black pixel, but they can be 7 if the pixel is an ending, or less that 6 if the pixel is ina joint. The result of this approach exhibits a uniform thickness of about 3 pixels.

5.4 Neural Network Based Recognition

Template matching, structural analysis and neural networks have been the most popular classification methods in characterrecognition, but neural networks have proved to offer better and more reliable accuracy for handwriting recognition [59, 29].No single neural network model appears to be inherently better than others to a significant extent; instead, higher accuracyrates are achieved by tailoring network models to the particular problem environment. A network trained to recognize justdigits offer a better accuracy for check amount recognition than a generalized neural network trained to recognize bothletters and numbers.

The structure selected for this module is the multilayer perceptron (MLP), which is the most widely used type of networkfor character recognition [58]. Other researchers have employed radial basis function networks (RBFN) and time delayneural networks (TDNN) for character recognition in French checks [47] and some of them have attained similar results [59].The recognition module is implemented as an array of 4 neural networks working in parallel. Their results are analyzedby an arbiter function and a final checking is performed at the end. The structure is shown in Figure 16.

– 18 –

Size Normalization

Thickness normalization

Slant Correction

Structural Feature (DDD)

MLP #1 MLP #2 MLP-DDD #1 MLP-DDD #2

Arbiter Function

Structural checking

Segment Normalization

Figure 16: General scheme of Recognition Module

All the networks have a similar structure, based on three-layered, fully connected, feed-forward MLP, with 256 nodes asinput layer, 80 nodes in the hidden layer and 11 outputs. The input is a 16x16 matrix of the normalized image of thesegment; however, two of the networks also incorporate a structural feature as an input. The structural feature consideredis the directional distance distribution (DDD) feature discussed by Oh and Suen [55]. The outputs correspond to the digitsfrom '0' to '9', the symbol '#' which is frequently used as a delimiter in Brazil and the NOT_RECOGNIZE value. The segmentationfeedback loop requires a NOT_RECOGNIZE value in the case of incorrectly segmented digit (Figure 17) or if the segment containsconnected digits. The recognition module was trained with digits and segments produced by the segmentation module thatused images of real checks. In contrast, some researchers have opted to train their recognition systems with huge databasesof digits in order to attain very high accuracy levels [65]. These systems used digits from the NIST database of handwrittenforms [24] or other databases of pre-segmented digits. Such systems bypass the problems related to segmentation, andcannot be compared to the recognition module used for reading bank checks in an equitable manner. In our experience, thehandwriting styles in Brazil and the U.S. vary significantly for some digits, especially number 1 and 7. Also, the recognitionmodule must be trained with segments containing fragments of digits and segments containing connected digits to be ableto produce NOT_RECOGNIZE values properly in these cases.

– 19 –

Figure 17: Connected digits may be incorrectly separated by the segmentation loop, in this case each segment should beclassify as NOT_RECOGNIZE

Correct Incorrect Rejected

MLP 79.9% 10.2% 9.9%

MLP-DDD 85.2% 6.0% 8.7%

Arbiter 86.8% 6.5% 6.6%

After structuralPost-Processing

84.0% 5.5% 10.5%

Table 1: Accuracy of the recognition module using a database of segments from realchecks that include: normal digits, delimiter symbols, and segments of touching andoverlapping digits.

Neural networks were trained by back-propagation method using 3103 segments from real checks, and were tested foraccuracy of results using 1444 segments. The proportion of every digit, '#' symbols, and bad segments was chosen to matchthe corresponding rates that exists in real checks. The accuracy for MLP networks was 80% of correct readings with 10%of incorrect readings and 10% of rejections for low confidence, while the use of MLP-DDD techniques improved the accuracynumbers to 85% of correct reading with 6% of incorrect readings.

Two networks of the same structure trained exactly with the same data result in different parameters if the initial weightsare assigned randomly. As such, it is possible to have several networks that in theory classify in the same way, but inpractice produce slightly different results. By running multiple classification systems in parallel, one can increase theaccuracy of the classifier [58, 63, 5, 3]. As such, the system was designed to use two MLP networks and two MLP-DDDnetworks.

Each network produces a list of 11 confidence parameters associated with each possible output. Then another module, calledarbiter, analyzes this information and produces the global output. The structure of the neural network recognition is shownin Figure 16. The arbiter can decide that the segment has not been recognized properly if the global confidence has notattained the prescribed threshold or the values of the two highest confidence parameters are similar. Using thesecharacteristics, the overall recognition rate produced by the arbiter is 87% of correct reading and 6.6% of incorrect reading.

In addition to the array of networks used to enhance the overall accuracy, a structural post-processing was added to therecognition module to verify that the recognized value is correct. The structural analysis minimizes the number of incorrectreadings by checking specific structural properties in the image for the value produced by the arbiter in cases where theglobal confidence is lower than 90%. It proved to be especially useful at checking cases involving '0' and also eliminatingsome cases of number '5' where it was incorrectly read as '6' or as '3'. By applying the structural post-processing algorithms,several incorrect readings are eliminated. However, as a side effect, some correct reading cases are also rejected becausethey do not pass the strict structural test. Since the goal is to have minimum number of errors even though the reject ratemay increase, we accepted this consequence as being consistent with our design objectives. The final overall accuracy of therecognition module is 84% of correct reading and 5.5% of incorrect reading. The statistical results are shown on Table 1.

– 20 –

6 Post Processing

The final step in reading the courtesy amount is aimed at syntactic verification. In this post processing stage, the resultingstring is analyzed to determine whether the recognized value is a valid amount for a bank check. Special attention is paidto the punctuation in order to eliminate obvious incorrect results such as "1,23,45".

Contextual knowledge can often be exploited to enhance the recognition accuracy in most OCR applications. In the case ofthe Italian bankcheck processing system [18], the location of the courtesy amount is based on two ‘#’ delimiters generallyattached at both ends of the number. The courtesy amount can also be verified with the legal amount, written as a textsentence [26, 30, 36]. In systems designed to read postal codes from letters [14], the zip code is searched at the end of oneof the lower lines. The segmentation of zip codes could be based on the fact that the number of digits that form the code isfixed [15, 51, 14, 52]. The segmentation of dates into day, month and year can be based on special separation characters likedash, space and slash, which are commonly used in date strings [20, 64]. Some systems that read monetary amounts exploitcontextual information via syntax verification. This involves the notion of pattern parser [56] that determines the validityof a string in terms of monetary amounts. The A2iA Interchange system [43] includes the verifier module within therecognition module to read French checks. While in another French system [31], the syntactic analysis takes place after therecognition system. In the check recognition system proposed by Hussein et al. [34], a parser based on a Deterministic FiniteAutomata (DFA) is proposed within a module called “segmentation critic module”, this module is used to ensure propersegmentation before recognition.

In the system being described in this paper, the syntactic verifier is applied after recognition and is intended to minimizethe number of wrong readings and to show the values in a standard format [33]. The post processing module is based on aset of rules that incorporate the meanings of the decimal separator and the grouping symbol. There is no distinction betweenperiod and comma, both are treated as separators, and the analysis is based on the number of digits in each group. It is donein this way because the period sign and the comma sign are difficult to distinguish in handwritten form, and because theyhave opposite connotations in different countries. As an example, the norms for the new Euro currency [54] allow fothe useof the decimal symbol and the digit grouping symbols according to the national rules and practices of every country.

The syntactic verifier has three components: string formatter; punctuation checker; and punctuation restyle.

6.1 String formatter

The string formatter checks that the string generated by the recognition process does not intermix delimiters and digits.The correct syntax of the string can be expressed as the following regular expression:

[#]*[0-9,]*[#]*

This expression means that the amount string may begin or end with none, one, or more special delimiter characters. Thedelimiters are represented as character ‘#’ after recognition, although some persons use asterisks, a single line, a double line,ampersand or other symbols as delimiters. Between the delimiters there could be any combination of digits from 0 to 9 ora comma separator may exist. Delimiters are not very common in America checks because the standard format for writingdollar amounts uses the currency symbol ($) as a prefix and two decimal digits at the end. Using this format, it is notpossible to add numbers at the beginning or at the end of the amount to change the value, so people do not have the habitof writing other delimiters than the dollar sign which is pre-printed in the checks. But in Brazil and many countries inEurope, it is important to detect and eliminate delimiters.

The string formatter is only concerned about the format of delimiters in the input string, not its semantics. After checkingthe position of the delimiters, they are removed before going into the following step of postprocessing.

6.2 Punctuation Checker

The input string of the punctuation checker contains any combination of digits or commas. The purpose of this function isto determine if the syntax of the number is correct. To do so, it analyzes the sequence of digits and punctuation, payingattention to the number of digits found between punctuation symbols. It is important to know that the style for writingvalues is very flexible. It may or may not contain the fractional component (the one in cents) and it may or may not containpunctuation for grouping the integer part into 3-tuples. The only accepted styles (for values lower than 1,000,000.00) arethose that conform with the following regular expressions [22]:

Regular expressions for amounts with decimal part:^[0-9]*,[0-9][0-9]$

^[0-9],[0-9][0-9][0-9],[0-9][0-9]$

^[0-9][0-9],[0-9][0-9][0-9],[0-9][0-9]$

^[0-9][0-9][0-9],[0-9][0-9][0-9],[0-9][0-9]$

– 21 –

(3)

Start fromthe end

q1 q2

Error

D D D

PP P

q3D

P

Case 4

Case 1 Case 2 Case 3q4 q5

q6

q4

q0

Figure 18: Deterministic Finite Automaton to classify and check amount strings

Regular expressions for amounts without decimal part:^[0-9]*[,]*$

^[0-9],[0-9][0-9][0-9][,]*$

^[0-9][0-9],[0-9][0-9][0-9][,]*$

^[0-9][0-9][0-9],[0-9][0-9][0-9][,]*$

All these expressions mean that if a decimal part exists, then it must contain 2 digits. If the decimal part is not present, thenthe decimal separator is optional. In any case it is possible to group characters within the integer part, but these groupsmust be comprised of exactly three digits. If no grouping is done, all the digits can be written sequentially.

The algorithm to check the correct syntax is essentially a Deterministic Finite Automaton (DFA) shown in Figure 18. Itreads the string from right to left and the states change depending on the kind of symbol found. The alphabet E is the setof symbols that the string is comprised of. In this case the alphabet has only two elements: D for digit or P for punctuation.V is a finite set of valid states in the system, and FfV is the set of final states, which are represented as double circles in thegraph. The transitions from one state to another depend on the characters found in the string and are clearly marked inthe graph.

After the analysis of a few characters, one identifies four possible cases. Case 1 means that the string ends with apunctuation mark. This is a typical example of an integer amount (without decimal part) ended by a period and a line. Thatline has been identified as a delimiter in the previous step and it was removed from the string. So the algorithm detects theperiod and then it must read the number. In this case one has to read a number that may or may not have groupingpunctuation, this is done using the Integer DFA (q4) in Figure 19. Case 2 means that the string has 2 decimal digits,preceded by a punctuation character, therefore one has to read any kind of integer (again state q4). Case 3 finds apunctuation mark that precedes the right-most 3 digits. So the string does not have decimal part and the punctuation isunderstood as the grouping character. Hence the integer part is required to use grouping symbols and is analyzed using theGrouping DFA (q5) in Figure 19. Finally, case 4 is reached after reading four consecutive digits without punctuation. In thiscase, no decimal part or grouping symbol has been detected. From now on, the system does not expect to see any punctuationand the string will be checked using the DFA for state q6 in Figure 19.

– 22 –

q6

D

P

Error

q4 q7

Error

D D

P P

q8D

P

q9D

No grouping

Error Error

P

Grouping

Integer

Grouping No grouping

q5 q10

Error

D D

P P

q11D

P

q12D

Error Error

P

Error

q5

q6

Figure 19: Deterministic Finite Automaton to check different cases in the integer part

In the DFA algorithms for the integer part (Figure 19), all the punctuation symbols must be related to grouping digits, sothey can only appear every three digits. In the general algorithm for the integer part, starting at q4 in Figure 19, it isnecessary to read at least three consecutive digits to be able to establish if grouping characters are being used. At thatmoment, the state will become q5 so the algorithm will proceed analyzing the string with the Grouping algorithm (thatrequires grouping separators) or the state will become q6 and it will proceed with the No-Grouping algorithm (that does notallow grouping separators). If the DFA does not reach an error oval, and the string is finished while the current state q0F,then the string is accepted.

Consequently, a value like 34.5 is not accepted because the decimal part contains just one digit. The value 34.567 is acceptedas 34,567.00 because it is not possible to have three decimals so the period in this case is interpreted as the groupingcharacter. Unambiguous string values like 34.56 and 34,567.89 will be immediately accepted as such.

6.3 Punctuation Restyle

If the string is accepted, the last step in the post-processing module is to print the value in a standard way; adding a periodand two zeros in the amount does not have cents, and adding grouping separators in any case (for values greater than999.99). The following image shows an example of the prototype analyzing a check from Brazil. The result from thePunctuation Restyle module is the number that appears on the text box called Value in Figure 20.

– 23 –

Figure 20: Recognition of a Brazilian check using the prototype application

7 Conclusion

This paper has presented an overall approach for check recognition that can be utilized as the foundation for "reading" checksfrom many countries. This approach has been applied to read American and Brazilian bank checks, with appropriatedifferences in the two sets of implementations. The algorithms and techniques used to read bank checks can be applied toother environments that can benefit from automatic reading or image processing. Image pre-processing involves: conversionfrom gray scale into black and white pixels; organizing the information in connected components, and location of strings ina page with unspecified format. The output of the preprocessor serves as the input to the segmentation module. The lattertakes advantage of the results of the recognition module, via a feedback mechanism. Several splitting algorithms based onthe drop-fall methodology have been presented in this paper.

In the section on recognition of individual characters, the need of normalization of size and thickness of keystrokes has beenhighlighted. In addition, an approach has been proposed for slant correction. The neural network architecture employs aset of four neural networks of different types that are run in parallel to minimize the likelihood of erroneous readings. Thisarchitecture offers a superior accuracy and performance, as compared to structures comprised of single nets only.

Finally, the post processing module takes advantage of contextual information in monetary amounts. This module has beendeveloped as a generalized syntactic checker based on deterministic finite automata techniques. Although this module couldalso be used to help the segmentation process, it was decided to use it within the post processing module, with the primary

– 24 –

objective of reducing the incidence of reading errors. The post processing module also provides the string of the amount ina standard format, hence providing the information needed for processing financial transactions on an international basis.

8 Acknowledgments

The authors thank Professor Patrick Wang and other members of the project team for the valuable contributions. We alsothank representatives of a number of banks located in North America, South America, Europe and Asia for assisting withresearch related to their specific domains. Parts of the research described in this paper are covered by a patent [28], wethank the co-inventors, and others who were involved in the process.

9 References

[1] A. Agarwal, A. Gupta, K. Hussein, "Bank Check Analysis and Recognition by Computers" in Handbook of CharacterRecognition and Document Image Analysis. Editors: H.Bunke, and P.S.P. Wang. World Scientific, 1997.

[2] C.O. de Almendra Freitas, A. el Yacoubi, F.Bortolozzi, R.Sabourin. "Brazilian Bank Check Handwritten Legal AmountRecognition", Proceedings XIII Brazilian Symposium on Computer Graphics and Image Processing: 97-104, 2000.

[3] N. Arica and F.T. Yarman-Vural, "An overview of Character Recognition Focused on Off-Line Handwritting", IEEETransactions on Systems, Man, and Cybernetics – PartC: Applications and Reviews, 31(2):216-233, 2001.

[4] A. Agarwal, K. Hussein, A. Gupta, and P.S.P. Wang. "Detection of Courtesy Amount Block on Bank Checks" Journalof Electronic Imaging, Vol. 5(2), April 1996, pp. 214-224.

[5] A.S. Atukorale, P.N. Suganthan, "Combining Classifiers Based on Confidence Values", Proceedings of the 5th

International Conference on Document Analysis and Recognition, pp.37-40, 1999.

[6] H. Baird, H. Bunke, P.S.P. Wang. Document Analysis and Recognition, World Scientific, Singapore, 1994.

[7] M. Blumenstein and S. Verma, "A Neural Based Segmentation and Recognition Technique for Handwritten Words".IEEE International Conference on Neural Networks Vol. 3, 1738-1742. 1998.

[8] R. Bozinovic, and S. Srihari, "Off-line cursive script word recognition". IEEE Transactions on Pattern Analysis andMachine Intelligence, Vol 2, pp 68-83, 1989.

[9] H.Bunke and P.S.P.Wand. Handbook of character recognition and document image analysis. Ed. Wolrd Scientific.ISBN 981-02-2270-X, 1997.

[10] B.M.F. Bushofa and M. Spann, "Segmentation and Recognition of Arabic Characters by Structural Classification", inImage and Vision Computing, 15:167-179, 1997.

[11] R.C. Carrasco, and M.L. Forcada, "A note on the Nagendraprasad-Wang-Gupta thinning algorithm", PatternRecognition Letters, Vol 16, pp. 539-541, 1995.

[12] R.G. Casey and E. Lecolinet, "A Survey of Methods and Strategies in Character Segmentation", in IEEE Transactionson Pattern Analysis and Machine Intelligence, 18(7):690-706, 1996.

[13] F. Chin, F. Wu, "A Microprocessor-Based Optical Character Recognition Check Reader", Proceedings of the thirdInternational Conference on Document Analysis and Recognition, Vol 2:982-985, 1995

[14] E. Cohen, J.J. Hull, and S.N. Srihari. "Understanding handwritten text in a structured environment: determining zipcodes from addresses", in Character and Handwriting Recognition: Expanding Frontiers, Editor: P.S.P. Wang, pp221-264, World Scientific Publishing Co. 1991.

[15] G. Congedo, G. Dimauro, S. Impedovo, G. Pirlo "Segmentation of Numeric Strings." Proceedings of the ThirdInternational Conference on Document Analysis and Recognition, Vol. II 1038-1041. 1995.

[16] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to algorithms, MIT Press and McGraw-Hill, Cambridge,MA, and New York. 1991.

[17] S. Dey "Adding Feedback to Improve Segmentation and Recognition of Handwritten Numerals", Master Thesis,Massachusetts Institute of Technology. 1999.

[18] G. Dimauro, S. Impedovo, G.Pirlo, and A.Salzo. "Automatic Bankcheck Processing: A New Engineered System”,International Journal of Pattern Recognition and Artificial Intelligence, Vol 11(4), pp467-503, 1997.

[19] R.O.Duda and P.E.Hart, Pattern Classification and scene analysis, Wiley, New York. (1973)

– 25 –

[20] R. Fan, L. Lam, and C.Y. Suen, “Processing of date information on cheques”, Progress in Handwriting Recognition,Editors A.C. Downton and C. Impedovo, World Scientific, 1997, pp.473-479.

[21] L. Fletcher and R. Kasturi. "A robust algorithm for text string separation from mixed text/graphics images", IEEEPattern Analysis and Machine Intelligence, 10(6), 910-918. (Nov 1988)

[22] J.E.F. Friedl, Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools. Ed. O'Reilly andAssociates. ISBN: 1-56592-257-3, 1997

[23] K.S. Fu, Syntactic Pattern Recognition and Applications. Prentice-Hall, Englewood Cliffs, NJ, 1982

[24] M.D. Garris et. al. NIST Form-Based Handwritten Recognition System: Release 2.0. US Dept. of Commerce, TechnologyAdministration. National Institute of Standards and Technology. 1997.

[25] R.C.Gonzales and P.Wintz. Digital Image Processing, Addison-Wesley, Reading, MA (1987)

[26] N.Gorski, V.Anisimov, E.Augustin, O.Baret, D.Price, and J.C.Simon, "A2iA Check Reader: A Family of BankRecognition Systems", Fifth International Conference on Document Analysis and Recognition, pp 523-526, Bangalore,India. (1999)

[27] D. Guillevic and C.Y. Suen, "HMM-KNN Word Recognition Engine for Bank Check Processing", Proceeding of theInternational Conference on Pattern Recognition, Vol 2. 1998.

[28] A. Gupta, M.V. Nagendraprasad, P.S.P. Wang, "System and method for character recognition with normalization", U.S.Patent No. 5633954, 1997.

[29] I. Guyon and P.S.P. Wang. Advances in Pattern Recognition Systems using Neural Network Technologies. Ed WorldScientific. ISBN 981-02-144-8. 1993

[30] K. Han and I. Sethi, “An off-Line cursive handwritten word recognition system and its application to legal amountinterpretation”, Automatic Bankcheck Processing, Wold Scientific Press, 1997, pp. 295-308.

[31] L. Heutte, P. Barbosa-Pereira, O. Bougeois, J.V. Moreau, B. Plessis, P. Courtellemont, and Y. Lecourtier, “Multi-BankCheck Recognition System: Consideration on the Numeral Amount Recognition Module”, International Journal ofPattern Recognition and Artificial Intelligence, Vol 11(4), pp595-618, 1997.

[32] S. Hinds, J. Fisher, and D. D'Amato. "A document skew detection method using run length encoding and the Houghtransform" in Proceedings of the 10th International Conference on Pattern Recognition 2, 464-468, Atlantic City, NJ(June 1990)

[33] F. Hu, “A syntax Verification Module for the Winbank System”, Advanced Undergraduate Project, MassachusettsInstitute of Technology, 1999.

[34] K. Hussein, A. Agarwal, A. Gupta, and P.S.P. Wang. "A Knowledge-Based Segmentation Algorithm for EnhancedRecognition of Handwritten Courtesy Amounts", Pattern Recognition, Vol 32(2), pp. 305-316, Feb 1999.

[35] T.C. Jee, E.J. Kim, Y. Lee, "Error Correction of Korean Courtesy Amounts in Bank Slips using Rule Information andCross-Referencing", in Proceedings of the International Conference on Document Analysis and Recognition, pp 95-98.1999.

[36] G. Kaufmann and H. Bunke, “A system for the automated reading of check amounts – Some key ideas”, Proceedingsof the 3rd International Association of Pattern Recognition Workshop on Document Analysis Systems. Nagano, Japan,1998, pp. 302-315

[37] J. Keeler, D. Rumelhart, "A Self Organizing Integrated Segmentation and Recognition Neural Network". Proceedingsof SPIE: The International Society for Optical Engineering Vol. 1710 744-755. 1992.

[38] S. Kelland and S. Wesolkowski, "A comparison of Research and Production Architectures for Check Reading Systems",Fifth International Conference on Document Analysis and Recognition, pp 99-102, Bangalore, India.. 1999.

[39] S. Khan, "Character Segmentation Heuristics of Check Amount Verification" Master Thesis, Massachusetts Instituteof Technology. 1998

[40] J.H. Kim, K.K. Kim, C.P. Nadal, and C.Y. Suen, “A methodology of combining HMM and MLP classifiers for cursiveword recognition”, Proceedings of the International Conference on Pattern Recognition, Barcelona, Spain, Vol 2, pp.319-322, 2000.

[41] K.K. Kim, J.H. Kim, Y.K.Chung, and C.Y. Suen, "Legal Amount Recognition Based on the Segmentation Hypothesesfor Bank Check Processing", Proceedings of the Sixth International Conference on Document Analysis and Recognition:964-967, 2001

– 26 –

[42] F. Kimura and M. Shridhar, "Segmentation-Recognition Algorithm for Zip Code Field Recognition", Machine Visionand Applications 5 (3) 199-210. 1992.

[43] S. Knerr, V. Anisimov, O. Beret, N. Gorski, D. Price and J.C. Simon, “The A2iA Intercheque System: Courtesy Amountand Legal Amount Recognition for French Checks”, International Journal of Pattern Recognition and ArtificialIntelligence, Vol 11(4), pp505-547, 1997.

[44] L. Lam, S.W. Lee, and C.Y. Suen, "Thinning Methodologies – A Comprehensive Survey", in IEEE Transactions onPattern Analysis and Machine Intelligence, 14(9):869-885, 1992.

[45] L. Lam and C.Y. Suen, "An Evaluation of Parallel Thinning Algorithms for Character Recognition", in IEEETransactions on Pattern Analysis and Machine Intelligence, 17(9):914-919, 1995.

[46] S.W. Lee, D.J. Lee, H.S. Park, "A New Methodology for Gray-Scale Character Segmentation and Recognition". IEEETransactions on Pattern Analysis and Machine Intelligence 16 (10) 1045-1050. 1996.

[47] E. Lethelier, M. Leroux, M. Gilloux, "An Automatic Reading System for Handwritten Numeral Amounts on Frenchchecks." 3rd International Conference on Document Analysis and Recognition, vol 1:92-97, 1995.

[48] J.S. Lim, Two-Dimensional Signal and Image Processing. Englewood Cliffs, NJ: Prentice Hall, pp. 469-476. 1990.

[49] G. Martin, "Centered-Object Integrated Segmentation and Recognition of Overlapping Handprinted Characters".Neural Computation 5, 419-429. 1993.

[50] G. Martin, R. Mosfeq, J. Pittman, "Integrated Segmentation and Recognition Through Exhaustive Scans or LearnedSaccadic Jumps". International Journal of Pattern Recognition and Artificial Intelligence 7 (4) 831-847. 1993.

[51] O. Matan, et al, “Reading Handwritten Digits: A Zip Code Recognition System”, Computer, Vol. 25(7), pp. 59-64, 1992

[52] B.T. Mitchell and A.M. Gillies, “A Model-Based Computer Vision System for Recognizing Handwritten Zip Codes”,Machine Vision & Applications, Vol 2 (4), pp 231-243, 1989.

[53] M. V. Nagendraprasad, P. S. P. Wang, and A. Gupta "Algorithms for Thinning and Rethickening Digital Patterns"Digital Signal Processing, 3, 97-102 , 1993.

[54] Official page on "Questions & Answers on the euro and European Economic and Monetary Union".http://europa.eu.int/euro/quest/, Jan 2002

[55] I. Oh and C.Y. Suen, "A Feature for Character Recognition Based on Directional Distance Distributuion", Proceedingsof 4th International Conference on Document Analysis and Recognition. ICDAR. 1997.

[56] M. Prundaru and I. Prundaru, “Syntactic Handwritten Numeral Recognition by Multi-Dimensional Gramars”,Proceedings of SPIE: The International Society for Optical Engineering, Vol 3365, pp. 302-309, 1998.

[57] J. Punnoose "An Improved Segmentation Module for Identification of Handwritten Numerals", Master Thesis,Massachusetts Institute of Technology. 1999.

[58] E. Roman, A. Gupta, J. Riordan." Integration of Traditional Imaging, Expert Systems, and Neural Network Techniquesfor Enhanced Recognition of Handwritten Information." IFSRC . No. 124-90. 1990

[59] A. Sinha, "An Improved Recognition Module for the Identification of Handwritten Digits", Master Thesis, MassachusettsInstitute of Technology. 1999.

[60] P.L. Sparks and M.V. Nagendraprasad, and A. Gupta, "An Algorithm for Segmenting Handwritten Numerals". SecondInternationl Conference on Automation, Robotics, and Computer Vision, Singapore, 1992.

[61] S.Srihari. "Feature extraction for locating address blocks on mail pieces", in From Pixels to Features, pp. 261-275, NorthHolland, Amsterdam. 1987.

[62] S.Srihari, C.Wang, P.Palumbo, and J.Hull. "Recognizing address blocks on mail pieces", AI Magazine 8(4):25-40. Dec.1987.

[63] C.Y. Suen, C. Nadal, R. Legault, T. Mai, and L. Lam, "Computer Recognition of Unconstrained Handwritten Numerals",Proceedings of the IEEE, 80(7):1162-1180, 1992.

[64] C.Y. Suen, Q. Xu, and L.Lam, "Automatic Recognition of Handwritten data on cheques – Fact or Fiction?", PatternRecognition Letters. 20:1287-1295, 1999

[65] C.Y. Suen, J. Kim, K. Kim, Q. Xu, and L. Lam, "Handwriting Recognition – The Last Frontiers", Proceedings of theInternational Conference on Pattern Recognition, vol. 4:1-10, 2000.

[66] K. Ueda, T. Mutoh and K. Matsuo, "Automatic Verification System for Seal Imprints on Japanese Bankchecks", inProceedings of the International Conference on Pattern Recognition, vol 1. 1998.

– 27 –

[67] United States Check Study. Results of 1999 Research. Edited by The Green Sheet, Inc.http://www.greensheet.com/CheckStudy/index99.html, Oct 25, 2001

[68] P.S.P.Wang, and Y.Y.Zhang, "A fast and flexible thinning algorithm", IEEE Transactions on Computers, 38(5), 1989

[69] P.S.P.Wang. Character and Handwriting Recognition: Expanding Frontiers. Ed. World Scientific. ISBN 981-02-0710-7,1991.