Presentation Review 5

7/27/2019 Presentation Review 5

1/48

PRESENTATION

on

HANDWRITTEN URDU SCRIPT NUMERALS RECOGNITION

By

Sartaj KhanM Tech (Sequential) IV Sem

Roll No : 6410110020

Supervisor

Mr Hitendra GargMCA,MS(BITS,Pilani),Ph d*


2/48

CONTENTS

INTRODUCTION AIM

OCR

DATA SET COLLECTION

PREPROCESSING NORMALIZATION

NOISE REMOVAL FEATURE EXTRACTION

ZONING

DENSITY

CONCAVITY

CONTOUR

RESULTS

GENETIC ALGORITHM

COMPARING RESULTS

CONCLUSION

REFERENCES


3/48

INTRODUCTION

Handwritten numeral recognition is in general a

benchmark problem of Pattern Recognition and

Artificial Intelligence.

Compared to the problem of printed numeralrecognition, the problem of handwritten numeral

recognition is compounded due to variations in

shapes and sizes of handwritten characters.

Considering all these, the problem ofhandwritten numeral recognition is addressed

under the present work in respect to handwritten

Urdu numerals.


4/48

AIM

Density, Concavity and Contour features are extracted; best results

are reported using combination of density and concavity. To find out

the optimal feature subset, genetic solution is suggested so as to

reduce computational effort and increase recognition accuracy.

On experimentation with a database of 20000 samples, the

technique yields an average recognition rate of 97.8% evaluated

after three-fold cross validation of results.

It is useful for applications related to OCR of handwritten UrduNumerals and can also be extended to include OCR of handwritten

characters of Urdu alphabets


5/48

OCR

Optical character recognition, usually abbreviated to

OCR, is the mechanical or electronic translation of

scanned images of handwritten, typewritten or printedtext into machine-encoded text

Contd..


6/48

OCR

In on-line character recognition systems, thecomputer recognizes symbols as they are drawn.

While off-line recognition is performed afterwriting or printing is completed.

Contd..


7/48

APPLICATIONS

Assigning ZIP codes to letter mail.

Reading data entered in forms, e.g. tax forms.

Automatic accounting procedures used in

processing utilities bills.Verification of account numbers and courtesy

amounts on bank checks.

Automatic accounting of airline passenger

tickets. Automatic validation of passports.


8/48

DATA SET COLLECTION

Our objective is to obtain a set of handwritten samples of

Urdu numerals that capture variations in handwriting

between and within writers.

Therefore, we need numeral samples from multiple writers,

as well as multiple samples from each writer.

Contd..


9/48

CRITERIA FOR SELECTION OF NUMERALS

The different numerals would be written in the specified

block as shown below. The persons writing the numbers

are free to use different quality pens, different ink color

etc.

They should try to write the numerals in the specified

grids, the numerals should not touch the grid lines and

one numeral should also entirely written within the

specified boundary, if it fails this criteria the algorithm

used will remove the parts which lies outside boundary.

Contd.


10/48

Each person would write 1-10 (in Urdu Script) ten times ineach row. Thus, each numeral would be written 10 timesand one person would write 100 numerals.

On the Above criteria we collected the samples of 200person. (Samples of one numeral 200*10=2000)

This resulted in 2000 samples. Out of these, 1000 samples

(10 x100) were randomly selected and were stored in thedatabase and 1000 were used as test images.

Contd.

CRITERIA FOR SELECTION OF NUMERALS


11/48

BLANK FORMAT FOR COLLECTING HAND WRITTEN URDU NUMERAL


12/48

SAMPLE DATA SHEET AFTER BEING DULY FILLED


13/48


14/48

NORMALIZATION

Normalization is the process of standardize the size of each image.

Steps in Normalization Process:

Start the image frame with Xsize0 * Ysize0 pixels which fit the

isolated numeral, by removing blank rows and columns.

Rescale the size of the image to Xsize * Ysize pixels which is

the maximum size according to Xsize0 or Ysize0 i.e.

Xsize = max(Xsize0,Ysize0)

Ysize = max(Xsize0, Ysize0)

Contd..


15/48

NORMALIZATION

Image before normalization Image after normalization


16/48

NOISE REMOVAL

During the scanning process some noise is introduced thereasons for such noise could be some specks of dust on the

scanner, poor quality of the paper on which the numerals are

written etc.

If there is a single black pixel or continuously 2 or three blackpixels then it is a noise ,remove the noise i.e. convert the noise

pixel in to white pixel.

With noise Without Noise


17/48

In feature extraction stage each character

is represented as a feature vector, which

becomes its identity.

The major goal of feature extraction is to

extract a set of features, which maximizes

the recognition rate with the least amount ofelements

FEATURE EXTRACTION


18/48

REQUIREMENTS OF A GOOD FEATURE SET

It should have a good discriminating power in order to enable

the correct identification even among very similar symbols.

It should not be too time consuming to compute.

As far as possible, the features set should be rotation scaling

and translation invariant so that the recognition is independent

of font, size and pitch.

The feature set should accord some immunity to noise.

The feature set must offer a complete description of the

character set to be recognized.


19/48

DIFFERENT FEATURES EXTRACTION

ZONING

DENSITY

CONCAVITY

CONTOUR


20/48

ZONING

The character image is divided into NxM zones.

From each zone features are extracted to form the

feature vector. The goal of zoning is to obtain the

local characteristics instead of globalcharacteristics.


21/48

DENSITY

Density Feature is Calculated as

Contd.

The number of dark pixels in each cell is considered a feature.

Darker squares indicate higher density of zone pixels.


22/48

DENSITY

Steps for Density Feature:-

Break the box into thirty six equal parts regions/zones size of (6 x 6) each.

Compute the Density of Pixels in each zone.

Store these features in to a file.

Contd.


23/48

CONCAVITY

These features are used to highlight the topological and

geometrical properties of the digit classes. Each concavity

feature represents the number of white pixels that belong to a

specific concavity configuration.

The label for each white pixel is chosen based on the Freeman

code with four direction. Each direction is explored until the

encounter of a black pixel or the limits imposed by the digit-

bounding box. A white pixel is labeled if at least two consecutive

directions find black pixels. Thus, we have 9 possible concavityconfigurations. Moreover, we consider four more configurations,

in order to detect more precisely the presence of loops. The

total length of this feature vector is then 13.

Contd.


24/48

Showing the 9 concavity configurations and also 4 configurations for false loop

Contd.


25/48

CONCAVITY

Contd.


26/48

CONTOUR

The number of interior and exterior contours is extracted from the chain

code representation of the image.

Connectivity features extracted for a line

Contd.


27/48

CONTOUR

To extract the direction of the numerals contour, the normalized image ( 36 x36)

pixels) is divided into 6 x 6 cells. The size of each cell is 6 x 6 pixels.

There are 4 feature windows in 3 x 3 pixels, consisting of 4 directions in

horizontal (A), vertical (B), left diagonal (C) and right diagonal (D)

XC = ( X1A, X1B, X1C, X1D, X2A, X2B, ..X36A, X36B, X36C, X36D)


28/48

TEST RESULTS

DENSITY

CONCAVITY

CONTOUR

DENSITY & CONCAVITY

DENSITY & CONTOUR

CONCAVITY & CONTOUR

DENSITY , CONCAVITY & CONTOUR


29/48

DENSITY FEATURE'S RESULT 86.9%

NU

M

ER

AL

Recognized As

99 0 0 0 0 0 0 1 0 0

0 93 0 0 3 0 4 0 0 0

0 0 86 2 2 2 0 2 1 5

0 0 1 89 4 1 1 0 1 3

0 3 1 2 70 7 0 7 1 12

2 1 1 5 6 80 0 0 0 5

0 0 0 0 0 0 89 0 1 10

3 1 0 1 4 1 5 80 1 4

0 1 1 1 2 0 0 0 93 2

0 0 1 1 2 0 0 6 0 90


30/48

CONCAVITY FEATURE'S RESULT 86.9%

NU

M

ER

AL

Recognized As

99 0 0 0 0 0 0 1 0 0

0 93 0 0 3 0 4 0 0 0

0 0 86 2 2 2 0 2 1 5

0 0 1 89 4 1 1 0 1 3

0 3 1 2 70 7 0 7 1 12

2 1 1 5 6 80 0 0 0 5

0 0 0 0 0 0 89 0 1 10

3 1 0 1 4 1 5 80 1 4

0 1 1 1 2 0 0 0 93 2

0 0 1 1 2 0 0 6 0 90


31/48

CONTOUR FEATURE'S RESULT 86.9%

NU

M

ER

AL

Recognized As

99 0 0 0 0 0 0 1 0 0

0 93 0 0 3 0 4 0 0 0

0 0 86 2 2 2 0 2 1 5

0 0 1 89 4 1 1 0 1 3

0 3 1 2 70 7 0 7 1 12

2 1 1 5 6 80 0 0 0 5

0 0 0 0 0 0 89 0 1 10

3 1 0 1 4 1 5 80 1 4

0 1 1 1 2 0 0 0 93 2

0 0 1 1 2 0 0 6 0 90


32/48

DENSITY & CONCAVITY FEATURE'S RESULT 86.9%

NU

M

ER

AL

Recognized As

99 0 0 0 0 0 0 1 0 0

0 93 0 0 3 0 4 0 0 0

0 0 86 2 2 2 0 2 1 5

0 0 1 89 4 1 1 0 1 3

0 3 1 2 70 7 0 7 1 12

2 1 1 5 6 80 0 0 0 5

0 0 0 0 0 0 89 0 1 10

3 1 0 1 4 1 5 80 1 4

0 1 1 1 2 0 0 0 93 2

0 0 1 1 2 0 0 6 0 90


33/48

DENSITY & CONTOUR FEATURE'S RESULT 86.9%

NU

M

ER

AL

Recognized As

99 0 0 0 0 0 0 1 0 0

0 93 0 0 3 0 4 0 0 0

0 0 86 2 2 2 0 2 1 5

0 0 1 89 4 1 1 0 1 3

0 3 1 2 70 7 0 7 1 12

2 1 1 5 6 80 0 0 0 5

0 0 0 0 0 0 89 0 1 10

3 1 0 1 4 1 5 80 1 4

0 1 1 1 2 0 0 0 93 2

0 0 1 1 2 0 0 6 0 90


34/48

CONCAVITY & CONTOUR FEATURE'S RESULT 86.9%

NU

M

ER

AL

Recognized As

99 0 0 0 0 0 0 1 0 0

0 93 0 0 3 0 4 0 0 0

0 0 86 2 2 2 0 2 1 5

0 0 1 89 4 1 1 0 1 3

0 3 1 2 70 7 0 7 1 12

2 1 1 5 6 80 0 0 0 5

0 0 0 0 0 0 89 0 1 10

3 1 0 1 4 1 5 80 1 4

0 1 1 1 2 0 0 0 93 2

0 0 1 1 2 0 0 6 0 90


35/48

DENSITY, CONCAVITY & CONTOUR FEATURE'S RESULT 86.9%

NU

M

ER

AL

Recognized As

99 0 0 0 0 0 0 1 0 0

0 93 0 0 3 0 4 0 0 0

0 0 86 2 2 2 0 2 1 5

0 0 1 89 4 1 1 0 1 3

0 3 1 2 70 7 0 7 1 12

2 1 1 5 6 80 0 0 0 5

0 0 0 0 0 0 89 0 1 10

3 1 0 1 4 1 5 80 1 4

0 1 1 1 2 0 0 0 93 2

0 0 1 1 2 0 0 6 0 90


36/48

SUMMARY OF THE RESULTS USING VARIOUS FEATURES

Name of Features Size of Feature Vector Results

CONCAVITY + DENSITY 90 92.6%

CONCAVITY + CONTOUR 144 88.6%

CONTOUR + DENSITY 180 88.6%

CONCAVITY 54 87%

DENSITY 36 86.9%

CONTOUR

CONTOUR + CONCAVITY + DENSITY 234 88.7%


37/48

GENETIC ALGORITHM

The GA is a searching process based on the laws of natural selection andgenetics. Usually, a simple GA consists of three operations: Selection,

Genetic Operation, and Replacement.

Contd.


38/48

GENETIC ALGORITHM

Contd.


39/48

GENETIC OPERATION

Crossover is a recombination operator thatcombines subparts of two parent chromosomes

to produce offspring that contain some parts of

both parents genetic material.

Contd.


40/48

GENETIC OPERATION

Mutat ion is an operator that introduces

variations into the chromosome. It randomly

alters the value of a string position. Each bit

of a bitstring is replaced by a randomly

generated bit.

Contd.


41/48

FEATURES SUBSET SELECTION USING GA

Representation of chromosome:-

A string of 90 binary numbers is taken as a representation of the subset of

features selected.

1 2 3 4 89 90

1 1 0 0 0 0 1 1 0 1

If a 1 appears in the string at position i , then it implies that this

feature corresponding to position i is selected to be in the subset

being formed and if it is a 0 then the corresponding feature is notselected.

Contd.


42/48

Contd.

SELECTION OF PARAMETERS

Population Size: 10

Number of generations: 1000

Probability of crossover: 0.9

Probability of mutation: 0.01


43/48

Result(90 Features)

Numerals For Best String out of 90

No. ofFeatures

% Results

98% 44 100

96% 50 99

92% 53 9196% 49 97

87% 48 93

92% 53 93

98% 47 10083% 40 93

97% 48 99

87% 52 95

92.6% 96%

COMPARISION RESULTS


44/48

CONCLUSION

To improve the performance of Urdu Script Numeral wemainly develop number of feature such Density,

Concavity, and Contour. We apply these feature and

combination of these features on the sample data. We

also use genetic algorithm for the above purpose andalso calculate the difference of accuracy between

genetic algorithm and other feature develop. The work

presented a GA based method for the Optimal Selection

of subset of features for increasing the recognition

accuracy and speed of recognition of Urdu Scriptnumerals.

REFERENCES


45/48

REFERENCES

J. Sadri, et.al , Application of Support Vector Machines for recognition of handwritten

Arabic/Persian digits, Proceeding of the Second Conference on Machine Vision and

Image Processing & Applications (MVIP), Vol. 1, Feb.2003, Iran, pp. 300-307.

Harifi et.al, A New Pattern for Handwritten Persian/Arabic Digit Recognition,

International Journal of Information Technology, Vol 1, Number 4, pp 174-177.

S.V. Rajashekararadhya et.al Efficient Zone Feature Extraction Algorithm for Handwritten

Numerals Recognition of Four South Indian Scripts, Journal of Theoretical and Applied

Information Technology 2008.

M. Hanmandlu et.al , Input fuzzy for the recognition of handwritten Hindi numeral:a,

International Conference on Informational Technology2007.

Al-Taani Ahmad et.al, Recognition of On-line Handwritten Arabic Digits Using Structural

Features and Transition Network Informatica 2008.

Contd.


46/48

Kam-Fai Chan and Dit-Yan Yeung. Recognizing on-line handwritten alphanumeric

characters through flexible structural matching. Pattern Recognition, Vol 32, pp.1099 - 1114, 1999.

M.I.Razzak, Muhammad Sher, S.A.Hussain, Z.S.Khan, Combining online and offline

preprocessing for online Urdu character recognition IMECS 09.

M. Pechwitz, V. M

argner, Baseline Estimation For Arabic Handwritten Words,IWFHR02.

Javad sadri et.al State of the art in Farsi script recognition Signal Processing and

its application, 2007.

Faouzi Bouchiareb, Mouldi Bedda, Salim Ouchetai "New Preprocessing

Methods for Handwritten Arabic Word" Asian Journal of InformationTechnology.

A. Amin, Off-line Arabic character recognition: The state of the art,

Pattern recognition, vol.31, pp.517-530, 1998.

REFERENCES

Contd.


47/48

REFERENCES

S. Mori, C. Y. Suen and K. Yamamoto, Historical review of OCR research

and development, Proceedings of the IEEE, vol.80, pp.1029-1058,1992.

V. K. Govindan and A. P. Shivprasad, Character Recognition - A Review",

Pattern Recognition, vol. 23, no. 7, pp 671-683, 1990.

B. B. Chaudhuri and U. Pal, A complete printed Bangla OCR system,

Pattern Recognition, vol.31, pp.531-549, 1998.


48/48

Thanks

Presentation Review 5

Documents

Transcript of Presentation Review 5