Presentation Review 5

download Presentation Review 5

of 48

Transcript of Presentation Review 5

  • 7/27/2019 Presentation Review 5

    1/48

    PRESENTATION

    on

    HANDWRITTEN URDU SCRIPT NUMERALS RECOGNITION

    By

    Sartaj KhanM Tech (Sequential) IV Sem

    Roll No : 6410110020

    Supervisor

    Mr Hitendra GargMCA,MS(BITS,Pilani),Ph d*

  • 7/27/2019 Presentation Review 5

    2/48

    CONTENTS

    INTRODUCTION AIM

    OCR

    DATA SET COLLECTION

    PREPROCESSING NORMALIZATION

    NOISE REMOVAL FEATURE EXTRACTION

    ZONING

    DENSITY

    CONCAVITY

    CONTOUR

    RESULTS

    GENETIC ALGORITHM

    COMPARING RESULTS

    CONCLUSION

    REFERENCES

  • 7/27/2019 Presentation Review 5

    3/48

    INTRODUCTION

    Handwritten numeral recognition is in general a

    benchmark problem of Pattern Recognition and

    Artificial Intelligence.

    Compared to the problem of printed numeralrecognition, the problem of handwritten numeral

    recognition is compounded due to variations in

    shapes and sizes of handwritten characters.

    Considering all these, the problem ofhandwritten numeral recognition is addressed

    under the present work in respect to handwritten

    Urdu numerals.

  • 7/27/2019 Presentation Review 5

    4/48

    AIM

    Density, Concavity and Contour features are extracted; best results

    are reported using combination of density and concavity. To find out

    the optimal feature subset, genetic solution is suggested so as to

    reduce computational effort and increase recognition accuracy.

    On experimentation with a database of 20000 samples, the

    technique yields an average recognition rate of 97.8% evaluated

    after three-fold cross validation of results.

    It is useful for applications related to OCR of handwritten UrduNumerals and can also be extended to include OCR of handwritten

    characters of Urdu alphabets

  • 7/27/2019 Presentation Review 5

    5/48

    OCR

    Optical character recognition, usually abbreviated to

    OCR, is the mechanical or electronic translation of

    scanned images of handwritten, typewritten or printedtext into machine-encoded text

    Contd..

  • 7/27/2019 Presentation Review 5

    6/48

    OCR

    In on-line character recognition systems, thecomputer recognizes symbols as they are drawn.

    While off-line recognition is performed afterwriting or printing is completed.

    Contd..

  • 7/27/2019 Presentation Review 5

    7/48

    APPLICATIONS

    Assigning ZIP codes to letter mail.

    Reading data entered in forms, e.g. tax forms.

    Automatic accounting procedures used in

    processing utilities bills.Verification of account numbers and courtesy

    amounts on bank checks.

    Automatic accounting of airline passenger

    tickets. Automatic validation of passports.

  • 7/27/2019 Presentation Review 5

    8/48

    DATA SET COLLECTION

    Our objective is to obtain a set of handwritten samples of

    Urdu numerals that capture variations in handwriting

    between and within writers.

    Therefore, we need numeral samples from multiple writers,

    as well as multiple samples from each writer.

    Contd..

  • 7/27/2019 Presentation Review 5

    9/48

    CRITERIA FOR SELECTION OF NUMERALS

    The different numerals would be written in the specified

    block as shown below. The persons writing the numbers

    are free to use different quality pens, different ink color

    etc.

    They should try to write the numerals in the specified

    grids, the numerals should not touch the grid lines and

    one numeral should also entirely written within the

    specified boundary, if it fails this criteria the algorithm

    used will remove the parts which lies outside boundary.

    Contd.

  • 7/27/2019 Presentation Review 5

    10/48

    Each person would write 1-10 (in Urdu Script) ten times ineach row. Thus, each numeral would be written 10 timesand one person would write 100 numerals.

    On the Above criteria we collected the samples of 200person. (Samples of one numeral 200*10=2000)

    This resulted in 2000 samples. Out of these, 1000 samples

    (10 x100) were randomly selected and were stored in thedatabase and 1000 were used as test images.

    Contd.

    CRITERIA FOR SELECTION OF NUMERALS

  • 7/27/2019 Presentation Review 5

    11/48

    BLANK FORMAT FOR COLLECTING HAND WRITTEN URDU NUMERAL

  • 7/27/2019 Presentation Review 5

    12/48

    SAMPLE DATA SHEET AFTER BEING DULY FILLED

  • 7/27/2019 Presentation Review 5

    13/48

  • 7/27/2019 Presentation Review 5

    14/48

    NORMALIZATION

    Normalization is the process of standardize the size of each image.

    Steps in Normalization Process:

    Start the image frame with Xsize0 * Ysize0 pixels which fit the

    isolated numeral, by removing blank rows and columns.

    Rescale the size of the image to Xsize * Ysize pixels which is

    the maximum size according to Xsize0 or Ysize0 i.e.

    Xsize = max(Xsize0,Ysize0)

    Ysize = max(Xsize0, Ysize0)

    Contd..

  • 7/27/2019 Presentation Review 5

    15/48

    NORMALIZATION

    Image before normalization Image after normalization

  • 7/27/2019 Presentation Review 5

    16/48

    NOISE REMOVAL

    During the scanning process some noise is introduced thereasons for such noise could be some specks of dust on the

    scanner, poor quality of the paper on which the numerals are

    written etc.

    If there is a single black pixel or continuously 2 or three blackpixels then it is a noise ,remove the noise i.e. convert the noise

    pixel in to white pixel.

    With noise Without Noise

  • 7/27/2019 Presentation Review 5

    17/48

    In feature extraction stage each character

    is represented as a feature vector, which

    becomes its identity.

    The major goal of feature extraction is to

    extract a set of features, which maximizes

    the recognition rate with the least amount ofelements

    FEATURE EXTRACTION

  • 7/27/2019 Presentation Review 5

    18/48

    REQUIREMENTS OF A GOOD FEATURE SET

    It should have a good discriminating power in order to enable

    the correct identification even among very similar symbols.

    It should not be too time consuming to compute.

    As far as possible, the features set should be rotation scaling

    and translation invariant so that the recognition is independent

    of font, size and pitch.

    The feature set should accord some immunity to noise.

    The feature set must offer a complete description of the

    character set to be recognized.

  • 7/27/2019 Presentation Review 5

    19/48

    DIFFERENT FEATURES EXTRACTION

    ZONING

    DENSITY

    CONCAVITY

    CONTOUR

  • 7/27/2019 Presentation Review 5

    20/48

    ZONING

    The character image is divided into NxM zones.

    From each zone features are extracted to form the

    feature vector. The goal of zoning is to obtain the

    local characteristics instead of globalcharacteristics.

  • 7/27/2019 Presentation Review 5

    21/48

    DENSITY

    Density Feature is Calculated as

    Contd.

    The number of dark pixels in each cell is considered a feature.

    Darker squares indicate higher density of zone pixels.

  • 7/27/2019 Presentation Review 5

    22/48

    DENSITY

    Steps for Density Feature:-

    Break the box into thirty six equal parts regions/zones size of (6 x 6) each.

    Compute the Density of Pixels in each zone.

    Store these features in to a file.

    Contd.

  • 7/27/2019 Presentation Review 5

    23/48

    CONCAVITY

    These features are used to highlight the topological and

    geometrical properties of the digit classes. Each concavity

    feature represents the number of white pixels that belong to a

    specific concavity configuration.

    The label for each white pixel is chosen based on the Freeman

    code with four direction. Each direction is explored until the

    encounter of a black pixel or the limits imposed by the digit-

    bounding box. A white pixel is labeled if at least two consecutive

    directions find black pixels. Thus, we have 9 possible concavityconfigurations. Moreover, we consider four more configurations,

    in order to detect more precisely the presence of loops. The

    total length of this feature vector is then 13.

    Contd.

  • 7/27/2019 Presentation Review 5

    24/48

    Showing the 9 concavity configurations and also 4 configurations for false loop

    Contd.

  • 7/27/2019 Presentation Review 5

    25/48

    CONCAVITY

    Contd.

  • 7/27/2019 Presentation Review 5

    26/48

    CONTOUR

    The number of interior and exterior contours is extracted from the chain

    code representation of the image.

    Connectivity features extracted for a line

    Contd.

  • 7/27/2019 Presentation Review 5

    27/48

    CONTOUR

    To extract the direction of the numerals contour, the normalized image ( 36 x36)

    pixels) is divided into 6 x 6 cells. The size of each cell is 6 x 6 pixels.

    There are 4 feature windows in 3 x 3 pixels, consisting of 4 directions in

    horizontal (A), vertical (B), left diagonal (C) and right diagonal (D)

    XC = ( X1A, X1B, X1C, X1D, X2A, X2B, ..X36A, X36B, X36C, X36D)

  • 7/27/2019 Presentation Review 5

    28/48

    TEST RESULTS

    DENSITY

    CONCAVITY

    CONTOUR

    DENSITY & CONCAVITY

    DENSITY & CONTOUR

    CONCAVITY & CONTOUR

    DENSITY , CONCAVITY & CONTOUR

  • 7/27/2019 Presentation Review 5

    29/48

    DENSITY FEATURE'S RESULT 86.9%

    NU

    M

    ER

    AL

    Recognized As

    99 0 0 0 0 0 0 1 0 0

    0 93 0 0 3 0 4 0 0 0

    0 0 86 2 2 2 0 2 1 5

    0 0 1 89 4 1 1 0 1 3

    0 3 1 2 70 7 0 7 1 12

    2 1 1 5 6 80 0 0 0 5

    0 0 0 0 0 0 89 0 1 10

    3 1 0 1 4 1 5 80 1 4

    0 1 1 1 2 0 0 0 93 2

    0 0 1 1 2 0 0 6 0 90

  • 7/27/2019 Presentation Review 5

    30/48

    CONCAVITY FEATURE'S RESULT 86.9%

    NU

    M

    ER

    AL

    Recognized As

    99 0 0 0 0 0 0 1 0 0

    0 93 0 0 3 0 4 0 0 0

    0 0 86 2 2 2 0 2 1 5

    0 0 1 89 4 1 1 0 1 3

    0 3 1 2 70 7 0 7 1 12

    2 1 1 5 6 80 0 0 0 5

    0 0 0 0 0 0 89 0 1 10

    3 1 0 1 4 1 5 80 1 4

    0 1 1 1 2 0 0 0 93 2

    0 0 1 1 2 0 0 6 0 90

  • 7/27/2019 Presentation Review 5

    31/48

    CONTOUR FEATURE'S RESULT 86.9%

    NU

    M

    ER

    AL

    Recognized As

    99 0 0 0 0 0 0 1 0 0

    0 93 0 0 3 0 4 0 0 0

    0 0 86 2 2 2 0 2 1 5

    0 0 1 89 4 1 1 0 1 3

    0 3 1 2 70 7 0 7 1 12

    2 1 1 5 6 80 0 0 0 5

    0 0 0 0 0 0 89 0 1 10

    3 1 0 1 4 1 5 80 1 4

    0 1 1 1 2 0 0 0 93 2

    0 0 1 1 2 0 0 6 0 90

  • 7/27/2019 Presentation Review 5

    32/48

    DENSITY & CONCAVITY FEATURE'S RESULT 86.9%

    NU

    M

    ER

    AL

    Recognized As

    99 0 0 0 0 0 0 1 0 0

    0 93 0 0 3 0 4 0 0 0

    0 0 86 2 2 2 0 2 1 5

    0 0 1 89 4 1 1 0 1 3

    0 3 1 2 70 7 0 7 1 12

    2 1 1 5 6 80 0 0 0 5

    0 0 0 0 0 0 89 0 1 10

    3 1 0 1 4 1 5 80 1 4

    0 1 1 1 2 0 0 0 93 2

    0 0 1 1 2 0 0 6 0 90

  • 7/27/2019 Presentation Review 5

    33/48

    DENSITY & CONTOUR FEATURE'S RESULT 86.9%

    NU

    M

    ER

    AL

    Recognized As

    99 0 0 0 0 0 0 1 0 0

    0 93 0 0 3 0 4 0 0 0

    0 0 86 2 2 2 0 2 1 5

    0 0 1 89 4 1 1 0 1 3

    0 3 1 2 70 7 0 7 1 12

    2 1 1 5 6 80 0 0 0 5

    0 0 0 0 0 0 89 0 1 10

    3 1 0 1 4 1 5 80 1 4

    0 1 1 1 2 0 0 0 93 2

    0 0 1 1 2 0 0 6 0 90

  • 7/27/2019 Presentation Review 5

    34/48

    CONCAVITY & CONTOUR FEATURE'S RESULT 86.9%

    NU

    M

    ER

    AL

    Recognized As

    99 0 0 0 0 0 0 1 0 0

    0 93 0 0 3 0 4 0 0 0

    0 0 86 2 2 2 0 2 1 5

    0 0 1 89 4 1 1 0 1 3

    0 3 1 2 70 7 0 7 1 12

    2 1 1 5 6 80 0 0 0 5

    0 0 0 0 0 0 89 0 1 10

    3 1 0 1 4 1 5 80 1 4

    0 1 1 1 2 0 0 0 93 2

    0 0 1 1 2 0 0 6 0 90

  • 7/27/2019 Presentation Review 5

    35/48

    DENSITY, CONCAVITY & CONTOUR FEATURE'S RESULT 86.9%

    NU

    M

    ER

    AL

    Recognized As

    99 0 0 0 0 0 0 1 0 0

    0 93 0 0 3 0 4 0 0 0

    0 0 86 2 2 2 0 2 1 5

    0 0 1 89 4 1 1 0 1 3

    0 3 1 2 70 7 0 7 1 12

    2 1 1 5 6 80 0 0 0 5

    0 0 0 0 0 0 89 0 1 10

    3 1 0 1 4 1 5 80 1 4

    0 1 1 1 2 0 0 0 93 2

    0 0 1 1 2 0 0 6 0 90

  • 7/27/2019 Presentation Review 5

    36/48

    SUMMARY OF THE RESULTS USING VARIOUS FEATURES

    Name of Features Size of Feature Vector Results

    CONCAVITY + DENSITY 90 92.6%

    CONCAVITY + CONTOUR 144 88.6%

    CONTOUR + DENSITY 180 88.6%

    CONCAVITY 54 87%

    DENSITY 36 86.9%

    CONTOUR

    CONTOUR + CONCAVITY + DENSITY 234 88.7%

  • 7/27/2019 Presentation Review 5

    37/48

    GENETIC ALGORITHM

    The GA is a searching process based on the laws of natural selection andgenetics. Usually, a simple GA consists of three operations: Selection,

    Genetic Operation, and Replacement.

    Contd.

  • 7/27/2019 Presentation Review 5

    38/48

    GENETIC ALGORITHM

    Contd.

  • 7/27/2019 Presentation Review 5

    39/48

    GENETIC OPERATION

    Crossover is a recombination operator thatcombines subparts of two parent chromosomes

    to produce offspring that contain some parts of

    both parents genetic material.

    Contd.

  • 7/27/2019 Presentation Review 5

    40/48

    GENETIC OPERATION

    Mutat ion is an operator that introduces

    variations into the chromosome. It randomly

    alters the value of a string position. Each bit

    of a bitstring is replaced by a randomly

    generated bit.

    Contd.

  • 7/27/2019 Presentation Review 5

    41/48

    FEATURES SUBSET SELECTION USING GA

    Representation of chromosome:-

    A string of 90 binary numbers is taken as a representation of the subset of

    features selected.

    1 2 3 4 89 90

    1 1 0 0 0 0 1 1 0 1

    If a 1 appears in the string at position i , then it implies that this

    feature corresponding to position i is selected to be in the subset

    being formed and if it is a 0 then the corresponding feature is notselected.

    Contd.

  • 7/27/2019 Presentation Review 5

    42/48

    Contd.

    SELECTION OF PARAMETERS

    Population Size: 10

    Number of generations: 1000

    Probability of crossover: 0.9

    Probability of mutation: 0.01

  • 7/27/2019 Presentation Review 5

    43/48

    Result(90 Features)

    Numerals For Best String out of 90

    No. ofFeatures

    % Results

    98% 44 100

    96% 50 99

    92% 53 9196% 49 97

    87% 48 93

    92% 53 93

    98% 47 10083% 40 93

    97% 48 99

    87% 52 95

    92.6% 96%

    COMPARISION RESULTS

  • 7/27/2019 Presentation Review 5

    44/48

    CONCLUSION

    To improve the performance of Urdu Script Numeral wemainly develop number of feature such Density,

    Concavity, and Contour. We apply these feature and

    combination of these features on the sample data. We

    also use genetic algorithm for the above purpose andalso calculate the difference of accuracy between

    genetic algorithm and other feature develop. The work

    presented a GA based method for the Optimal Selection

    of subset of features for increasing the recognition

    accuracy and speed of recognition of Urdu Scriptnumerals.

    REFERENCES

  • 7/27/2019 Presentation Review 5

    45/48

    REFERENCES

    J. Sadri, et.al , Application of Support Vector Machines for recognition of handwritten

    Arabic/Persian digits, Proceeding of the Second Conference on Machine Vision and

    Image Processing & Applications (MVIP), Vol. 1, Feb.2003, Iran, pp. 300-307.

    Harifi et.al, A New Pattern for Handwritten Persian/Arabic Digit Recognition,

    International Journal of Information Technology, Vol 1, Number 4, pp 174-177.

    S.V. Rajashekararadhya et.al Efficient Zone Feature Extraction Algorithm for Handwritten

    Numerals Recognition of Four South Indian Scripts, Journal of Theoretical and Applied

    Information Technology 2008.

    M. Hanmandlu et.al , Input fuzzy for the recognition of handwritten Hindi numeral:a,

    International Conference on Informational Technology2007.

    Al-Taani Ahmad et.al, Recognition of On-line Handwritten Arabic Digits Using Structural

    Features and Transition Network Informatica 2008.

    Contd.

  • 7/27/2019 Presentation Review 5

    46/48

    Kam-Fai Chan and Dit-Yan Yeung. Recognizing on-line handwritten alphanumeric

    characters through flexible structural matching. Pattern Recognition, Vol 32, pp.1099 - 1114, 1999.

    M.I.Razzak, Muhammad Sher, S.A.Hussain, Z.S.Khan, Combining online and offline

    preprocessing for online Urdu character recognition IMECS 09.

    M. Pechwitz, V. M

    argner, Baseline Estimation For Arabic Handwritten Words,IWFHR02.

    Javad sadri et.al State of the art in Farsi script recognition Signal Processing and

    its application, 2007.

    Faouzi Bouchiareb, Mouldi Bedda, Salim Ouchetai "New Preprocessing

    Methods for Handwritten Arabic Word" Asian Journal of InformationTechnology.

    A. Amin, Off-line Arabic character recognition: The state of the art,

    Pattern recognition, vol.31, pp.517-530, 1998.

    REFERENCES

    Contd.

  • 7/27/2019 Presentation Review 5

    47/48

    REFERENCES

    S. Mori, C. Y. Suen and K. Yamamoto, Historical review of OCR research

    and development, Proceedings of the IEEE, vol.80, pp.1029-1058,1992.

    V. K. Govindan and A. P. Shivprasad, Character Recognition - A Review",

    Pattern Recognition, vol. 23, no. 7, pp 671-683, 1990.

    B. B. Chaudhuri and U. Pal, A complete printed Bangla OCR system,

    Pattern Recognition, vol.31, pp.531-549, 1998.

  • 7/27/2019 Presentation Review 5

    48/48

    Thanks