Presentation Review 5
-
Upload
sartaj-khan -
Category
Documents
-
view
215 -
download
0
Transcript of Presentation Review 5
-
7/27/2019 Presentation Review 5
1/48
PRESENTATION
on
HANDWRITTEN URDU SCRIPT NUMERALS RECOGNITION
By
Sartaj KhanM Tech (Sequential) IV Sem
Roll No : 6410110020
Supervisor
Mr Hitendra GargMCA,MS(BITS,Pilani),Ph d*
-
7/27/2019 Presentation Review 5
2/48
CONTENTS
INTRODUCTION AIM
OCR
DATA SET COLLECTION
PREPROCESSING NORMALIZATION
NOISE REMOVAL FEATURE EXTRACTION
ZONING
DENSITY
CONCAVITY
CONTOUR
RESULTS
GENETIC ALGORITHM
COMPARING RESULTS
CONCLUSION
REFERENCES
-
7/27/2019 Presentation Review 5
3/48
INTRODUCTION
Handwritten numeral recognition is in general a
benchmark problem of Pattern Recognition and
Artificial Intelligence.
Compared to the problem of printed numeralrecognition, the problem of handwritten numeral
recognition is compounded due to variations in
shapes and sizes of handwritten characters.
Considering all these, the problem ofhandwritten numeral recognition is addressed
under the present work in respect to handwritten
Urdu numerals.
-
7/27/2019 Presentation Review 5
4/48
AIM
Density, Concavity and Contour features are extracted; best results
are reported using combination of density and concavity. To find out
the optimal feature subset, genetic solution is suggested so as to
reduce computational effort and increase recognition accuracy.
On experimentation with a database of 20000 samples, the
technique yields an average recognition rate of 97.8% evaluated
after three-fold cross validation of results.
It is useful for applications related to OCR of handwritten UrduNumerals and can also be extended to include OCR of handwritten
characters of Urdu alphabets
-
7/27/2019 Presentation Review 5
5/48
OCR
Optical character recognition, usually abbreviated to
OCR, is the mechanical or electronic translation of
scanned images of handwritten, typewritten or printedtext into machine-encoded text
Contd..
-
7/27/2019 Presentation Review 5
6/48
OCR
In on-line character recognition systems, thecomputer recognizes symbols as they are drawn.
While off-line recognition is performed afterwriting or printing is completed.
Contd..
-
7/27/2019 Presentation Review 5
7/48
APPLICATIONS
Assigning ZIP codes to letter mail.
Reading data entered in forms, e.g. tax forms.
Automatic accounting procedures used in
processing utilities bills.Verification of account numbers and courtesy
amounts on bank checks.
Automatic accounting of airline passenger
tickets. Automatic validation of passports.
-
7/27/2019 Presentation Review 5
8/48
DATA SET COLLECTION
Our objective is to obtain a set of handwritten samples of
Urdu numerals that capture variations in handwriting
between and within writers.
Therefore, we need numeral samples from multiple writers,
as well as multiple samples from each writer.
Contd..
-
7/27/2019 Presentation Review 5
9/48
CRITERIA FOR SELECTION OF NUMERALS
The different numerals would be written in the specified
block as shown below. The persons writing the numbers
are free to use different quality pens, different ink color
etc.
They should try to write the numerals in the specified
grids, the numerals should not touch the grid lines and
one numeral should also entirely written within the
specified boundary, if it fails this criteria the algorithm
used will remove the parts which lies outside boundary.
Contd.
-
7/27/2019 Presentation Review 5
10/48
Each person would write 1-10 (in Urdu Script) ten times ineach row. Thus, each numeral would be written 10 timesand one person would write 100 numerals.
On the Above criteria we collected the samples of 200person. (Samples of one numeral 200*10=2000)
This resulted in 2000 samples. Out of these, 1000 samples
(10 x100) were randomly selected and were stored in thedatabase and 1000 were used as test images.
Contd.
CRITERIA FOR SELECTION OF NUMERALS
-
7/27/2019 Presentation Review 5
11/48
BLANK FORMAT FOR COLLECTING HAND WRITTEN URDU NUMERAL
-
7/27/2019 Presentation Review 5
12/48
SAMPLE DATA SHEET AFTER BEING DULY FILLED
-
7/27/2019 Presentation Review 5
13/48
-
7/27/2019 Presentation Review 5
14/48
NORMALIZATION
Normalization is the process of standardize the size of each image.
Steps in Normalization Process:
Start the image frame with Xsize0 * Ysize0 pixels which fit the
isolated numeral, by removing blank rows and columns.
Rescale the size of the image to Xsize * Ysize pixels which is
the maximum size according to Xsize0 or Ysize0 i.e.
Xsize = max(Xsize0,Ysize0)
Ysize = max(Xsize0, Ysize0)
Contd..
-
7/27/2019 Presentation Review 5
15/48
NORMALIZATION
Image before normalization Image after normalization
-
7/27/2019 Presentation Review 5
16/48
NOISE REMOVAL
During the scanning process some noise is introduced thereasons for such noise could be some specks of dust on the
scanner, poor quality of the paper on which the numerals are
written etc.
If there is a single black pixel or continuously 2 or three blackpixels then it is a noise ,remove the noise i.e. convert the noise
pixel in to white pixel.
With noise Without Noise
-
7/27/2019 Presentation Review 5
17/48
In feature extraction stage each character
is represented as a feature vector, which
becomes its identity.
The major goal of feature extraction is to
extract a set of features, which maximizes
the recognition rate with the least amount ofelements
FEATURE EXTRACTION
-
7/27/2019 Presentation Review 5
18/48
REQUIREMENTS OF A GOOD FEATURE SET
It should have a good discriminating power in order to enable
the correct identification even among very similar symbols.
It should not be too time consuming to compute.
As far as possible, the features set should be rotation scaling
and translation invariant so that the recognition is independent
of font, size and pitch.
The feature set should accord some immunity to noise.
The feature set must offer a complete description of the
character set to be recognized.
-
7/27/2019 Presentation Review 5
19/48
DIFFERENT FEATURES EXTRACTION
ZONING
DENSITY
CONCAVITY
CONTOUR
-
7/27/2019 Presentation Review 5
20/48
ZONING
The character image is divided into NxM zones.
From each zone features are extracted to form the
feature vector. The goal of zoning is to obtain the
local characteristics instead of globalcharacteristics.
-
7/27/2019 Presentation Review 5
21/48
DENSITY
Density Feature is Calculated as
Contd.
The number of dark pixels in each cell is considered a feature.
Darker squares indicate higher density of zone pixels.
-
7/27/2019 Presentation Review 5
22/48
DENSITY
Steps for Density Feature:-
Break the box into thirty six equal parts regions/zones size of (6 x 6) each.
Compute the Density of Pixels in each zone.
Store these features in to a file.
Contd.
-
7/27/2019 Presentation Review 5
23/48
CONCAVITY
These features are used to highlight the topological and
geometrical properties of the digit classes. Each concavity
feature represents the number of white pixels that belong to a
specific concavity configuration.
The label for each white pixel is chosen based on the Freeman
code with four direction. Each direction is explored until the
encounter of a black pixel or the limits imposed by the digit-
bounding box. A white pixel is labeled if at least two consecutive
directions find black pixels. Thus, we have 9 possible concavityconfigurations. Moreover, we consider four more configurations,
in order to detect more precisely the presence of loops. The
total length of this feature vector is then 13.
Contd.
-
7/27/2019 Presentation Review 5
24/48
Showing the 9 concavity configurations and also 4 configurations for false loop
Contd.
-
7/27/2019 Presentation Review 5
25/48
CONCAVITY
Contd.
-
7/27/2019 Presentation Review 5
26/48
CONTOUR
The number of interior and exterior contours is extracted from the chain
code representation of the image.
Connectivity features extracted for a line
Contd.
-
7/27/2019 Presentation Review 5
27/48
CONTOUR
To extract the direction of the numerals contour, the normalized image ( 36 x36)
pixels) is divided into 6 x 6 cells. The size of each cell is 6 x 6 pixels.
There are 4 feature windows in 3 x 3 pixels, consisting of 4 directions in
horizontal (A), vertical (B), left diagonal (C) and right diagonal (D)
XC = ( X1A, X1B, X1C, X1D, X2A, X2B, ..X36A, X36B, X36C, X36D)
-
7/27/2019 Presentation Review 5
28/48
TEST RESULTS
DENSITY
CONCAVITY
CONTOUR
DENSITY & CONCAVITY
DENSITY & CONTOUR
CONCAVITY & CONTOUR
DENSITY , CONCAVITY & CONTOUR
-
7/27/2019 Presentation Review 5
29/48
DENSITY FEATURE'S RESULT 86.9%
NU
M
ER
AL
Recognized As
99 0 0 0 0 0 0 1 0 0
0 93 0 0 3 0 4 0 0 0
0 0 86 2 2 2 0 2 1 5
0 0 1 89 4 1 1 0 1 3
0 3 1 2 70 7 0 7 1 12
2 1 1 5 6 80 0 0 0 5
0 0 0 0 0 0 89 0 1 10
3 1 0 1 4 1 5 80 1 4
0 1 1 1 2 0 0 0 93 2
0 0 1 1 2 0 0 6 0 90
-
7/27/2019 Presentation Review 5
30/48
CONCAVITY FEATURE'S RESULT 86.9%
NU
M
ER
AL
Recognized As
99 0 0 0 0 0 0 1 0 0
0 93 0 0 3 0 4 0 0 0
0 0 86 2 2 2 0 2 1 5
0 0 1 89 4 1 1 0 1 3
0 3 1 2 70 7 0 7 1 12
2 1 1 5 6 80 0 0 0 5
0 0 0 0 0 0 89 0 1 10
3 1 0 1 4 1 5 80 1 4
0 1 1 1 2 0 0 0 93 2
0 0 1 1 2 0 0 6 0 90
-
7/27/2019 Presentation Review 5
31/48
CONTOUR FEATURE'S RESULT 86.9%
NU
M
ER
AL
Recognized As
99 0 0 0 0 0 0 1 0 0
0 93 0 0 3 0 4 0 0 0
0 0 86 2 2 2 0 2 1 5
0 0 1 89 4 1 1 0 1 3
0 3 1 2 70 7 0 7 1 12
2 1 1 5 6 80 0 0 0 5
0 0 0 0 0 0 89 0 1 10
3 1 0 1 4 1 5 80 1 4
0 1 1 1 2 0 0 0 93 2
0 0 1 1 2 0 0 6 0 90
-
7/27/2019 Presentation Review 5
32/48
DENSITY & CONCAVITY FEATURE'S RESULT 86.9%
NU
M
ER
AL
Recognized As
99 0 0 0 0 0 0 1 0 0
0 93 0 0 3 0 4 0 0 0
0 0 86 2 2 2 0 2 1 5
0 0 1 89 4 1 1 0 1 3
0 3 1 2 70 7 0 7 1 12
2 1 1 5 6 80 0 0 0 5
0 0 0 0 0 0 89 0 1 10
3 1 0 1 4 1 5 80 1 4
0 1 1 1 2 0 0 0 93 2
0 0 1 1 2 0 0 6 0 90
-
7/27/2019 Presentation Review 5
33/48
DENSITY & CONTOUR FEATURE'S RESULT 86.9%
NU
M
ER
AL
Recognized As
99 0 0 0 0 0 0 1 0 0
0 93 0 0 3 0 4 0 0 0
0 0 86 2 2 2 0 2 1 5
0 0 1 89 4 1 1 0 1 3
0 3 1 2 70 7 0 7 1 12
2 1 1 5 6 80 0 0 0 5
0 0 0 0 0 0 89 0 1 10
3 1 0 1 4 1 5 80 1 4
0 1 1 1 2 0 0 0 93 2
0 0 1 1 2 0 0 6 0 90
-
7/27/2019 Presentation Review 5
34/48
CONCAVITY & CONTOUR FEATURE'S RESULT 86.9%
NU
M
ER
AL
Recognized As
99 0 0 0 0 0 0 1 0 0
0 93 0 0 3 0 4 0 0 0
0 0 86 2 2 2 0 2 1 5
0 0 1 89 4 1 1 0 1 3
0 3 1 2 70 7 0 7 1 12
2 1 1 5 6 80 0 0 0 5
0 0 0 0 0 0 89 0 1 10
3 1 0 1 4 1 5 80 1 4
0 1 1 1 2 0 0 0 93 2
0 0 1 1 2 0 0 6 0 90
-
7/27/2019 Presentation Review 5
35/48
DENSITY, CONCAVITY & CONTOUR FEATURE'S RESULT 86.9%
NU
M
ER
AL
Recognized As
99 0 0 0 0 0 0 1 0 0
0 93 0 0 3 0 4 0 0 0
0 0 86 2 2 2 0 2 1 5
0 0 1 89 4 1 1 0 1 3
0 3 1 2 70 7 0 7 1 12
2 1 1 5 6 80 0 0 0 5
0 0 0 0 0 0 89 0 1 10
3 1 0 1 4 1 5 80 1 4
0 1 1 1 2 0 0 0 93 2
0 0 1 1 2 0 0 6 0 90
-
7/27/2019 Presentation Review 5
36/48
SUMMARY OF THE RESULTS USING VARIOUS FEATURES
Name of Features Size of Feature Vector Results
CONCAVITY + DENSITY 90 92.6%
CONCAVITY + CONTOUR 144 88.6%
CONTOUR + DENSITY 180 88.6%
CONCAVITY 54 87%
DENSITY 36 86.9%
CONTOUR
CONTOUR + CONCAVITY + DENSITY 234 88.7%
-
7/27/2019 Presentation Review 5
37/48
GENETIC ALGORITHM
The GA is a searching process based on the laws of natural selection andgenetics. Usually, a simple GA consists of three operations: Selection,
Genetic Operation, and Replacement.
Contd.
-
7/27/2019 Presentation Review 5
38/48
GENETIC ALGORITHM
Contd.
-
7/27/2019 Presentation Review 5
39/48
GENETIC OPERATION
Crossover is a recombination operator thatcombines subparts of two parent chromosomes
to produce offspring that contain some parts of
both parents genetic material.
Contd.
-
7/27/2019 Presentation Review 5
40/48
GENETIC OPERATION
Mutat ion is an operator that introduces
variations into the chromosome. It randomly
alters the value of a string position. Each bit
of a bitstring is replaced by a randomly
generated bit.
Contd.
-
7/27/2019 Presentation Review 5
41/48
FEATURES SUBSET SELECTION USING GA
Representation of chromosome:-
A string of 90 binary numbers is taken as a representation of the subset of
features selected.
1 2 3 4 89 90
1 1 0 0 0 0 1 1 0 1
If a 1 appears in the string at position i , then it implies that this
feature corresponding to position i is selected to be in the subset
being formed and if it is a 0 then the corresponding feature is notselected.
Contd.
-
7/27/2019 Presentation Review 5
42/48
Contd.
SELECTION OF PARAMETERS
Population Size: 10
Number of generations: 1000
Probability of crossover: 0.9
Probability of mutation: 0.01
-
7/27/2019 Presentation Review 5
43/48
Result(90 Features)
Numerals For Best String out of 90
No. ofFeatures
% Results
98% 44 100
96% 50 99
92% 53 9196% 49 97
87% 48 93
92% 53 93
98% 47 10083% 40 93
97% 48 99
87% 52 95
92.6% 96%
COMPARISION RESULTS
-
7/27/2019 Presentation Review 5
44/48
CONCLUSION
To improve the performance of Urdu Script Numeral wemainly develop number of feature such Density,
Concavity, and Contour. We apply these feature and
combination of these features on the sample data. We
also use genetic algorithm for the above purpose andalso calculate the difference of accuracy between
genetic algorithm and other feature develop. The work
presented a GA based method for the Optimal Selection
of subset of features for increasing the recognition
accuracy and speed of recognition of Urdu Scriptnumerals.
REFERENCES
-
7/27/2019 Presentation Review 5
45/48
REFERENCES
J. Sadri, et.al , Application of Support Vector Machines for recognition of handwritten
Arabic/Persian digits, Proceeding of the Second Conference on Machine Vision and
Image Processing & Applications (MVIP), Vol. 1, Feb.2003, Iran, pp. 300-307.
Harifi et.al, A New Pattern for Handwritten Persian/Arabic Digit Recognition,
International Journal of Information Technology, Vol 1, Number 4, pp 174-177.
S.V. Rajashekararadhya et.al Efficient Zone Feature Extraction Algorithm for Handwritten
Numerals Recognition of Four South Indian Scripts, Journal of Theoretical and Applied
Information Technology 2008.
M. Hanmandlu et.al , Input fuzzy for the recognition of handwritten Hindi numeral:a,
International Conference on Informational Technology2007.
Al-Taani Ahmad et.al, Recognition of On-line Handwritten Arabic Digits Using Structural
Features and Transition Network Informatica 2008.
Contd.
-
7/27/2019 Presentation Review 5
46/48
Kam-Fai Chan and Dit-Yan Yeung. Recognizing on-line handwritten alphanumeric
characters through flexible structural matching. Pattern Recognition, Vol 32, pp.1099 - 1114, 1999.
M.I.Razzak, Muhammad Sher, S.A.Hussain, Z.S.Khan, Combining online and offline
preprocessing for online Urdu character recognition IMECS 09.
M. Pechwitz, V. M
argner, Baseline Estimation For Arabic Handwritten Words,IWFHR02.
Javad sadri et.al State of the art in Farsi script recognition Signal Processing and
its application, 2007.
Faouzi Bouchiareb, Mouldi Bedda, Salim Ouchetai "New Preprocessing
Methods for Handwritten Arabic Word" Asian Journal of InformationTechnology.
A. Amin, Off-line Arabic character recognition: The state of the art,
Pattern recognition, vol.31, pp.517-530, 1998.
REFERENCES
Contd.
-
7/27/2019 Presentation Review 5
47/48
REFERENCES
S. Mori, C. Y. Suen and K. Yamamoto, Historical review of OCR research
and development, Proceedings of the IEEE, vol.80, pp.1029-1058,1992.
V. K. Govindan and A. P. Shivprasad, Character Recognition - A Review",
Pattern Recognition, vol. 23, no. 7, pp 671-683, 1990.
B. B. Chaudhuri and U. Pal, A complete printed Bangla OCR system,
Pattern Recognition, vol.31, pp.531-549, 1998.
-
7/27/2019 Presentation Review 5
48/48
Thanks