Post on 27-Oct-2019
CHARACTER Segmentation and Ground–truth preparation for handwritten
Bangla word images
Submitted by
SANCHITA MAITY
Exam. Roll No. : MCA-3212027 of 2011-12
University Regn. No. : 108560 of 2009-10
Under the guidance of
Mr. Ram Sarkar
Department of Computer Science and Engineering,
Jadavpur University.
A dissertation submitted in partial fulfillment of the requirements for the award of Master
of Computer Application (MCA)
Department of Computer Science and Engineering
Faculty of Engineering and Technology
Jadavpur University
Kolkata - 700 0032
20011 -2012
CONTENTS
Page no.
Chapter 1: Introduction 1
1.1 An overview on Optical Character
Recognition(OCR)
1
1.1.1 Description 1
1.1.2 History of OCR 2
1.1.3 Problem of OCR 5
1.1.4 Recent Trends in OCR research 6
1.2 Characteristic of Bangla script 6
1.3 Character Segmentation and Ground-truthing 9
1.3.1 What is character segmentation? 9
1.3.2 What is ground-truthing? 11
1.3.3 Importance of handwritten Bangla Word 12
Chapter 2: Review of existing work 13
2.1 Problems of Character Segmentation from
handwritten Bangla word images
13
2.2 Some recent character segmentation and ground-
truthing methodologies
14
2.2.1 A fuzzy technique for character segmentation
2.2.2 A two stage approach for segmentation
14
14
2.2.3 A database for unconstrained handwritten
Bangla word images
15
2.2.4 A complete handwritten numeral database 16
2.3 Motivation 16
Chapter 3: Present Work 18
3.1 Data collection methodologies 20
3.2 Segmentation 20
3.2.1 Selection of SF and DNS Components 24
3.2.1.1 Initial Selection of Obvious SF and DNS
Class Components
25
3.2.1.2 Classification of SF/DNS Components
using MLP
26
3.2.2 Determination of Matra Pixels using a Fuzzy
Membership Function and Horizontalness Feature for
SF components
30
3.2.3 Determination of Potential Segmentation
Points using Two Fuzzy Membership Functions for SF
components
33
3.2.4 Identification of Actual Segmentation Points in
the SF Components
34
3.3 Preparation Ground-truthed images 36
Chapter 4: Conclusion 49
References 50
1
Chapter 1
Introduction
1.1 An Overview on Optical Character Recognition
(OCR)
1.1.1 Description
Optical character recognition usually abbreviated to OCR, is the mathematical or
electronic translation of images of handwritten, typewritten or printed text (usually
captured by a scanner) into machine – editable text.
Broadly speaking, OCR system eases the barrier of the keyboard interface
between man and machine to a great extent, and help in advancement of office
automation. By doing so, OCR system facilate large scale document transcription with
huge saving of time and human effort. The systems has potential application in reading
amount from bank checks, extracting data from field-in forms and interpreting
handwritten address from mail pieces for automatic routine, and so on.
OCR is a field of research pattern recognition, artificial intelligence and machine
vision. Though academic research in a field continues the focus of OCR has shifted to
implementation of proven techniques. Optical character recognition (using optical
techniques such as mirrors and lenses and) and digital character recognition (using
scanners and computer algorithms) were originally considered separate fields. Because
2
vary few application survive that use true optical techniques, the OCR has now been
broaden to include digital image processing as well.
Early system required training (the provision of known samples of each
character) to read a specific font. ―Intelligent‖ systems with a high degree of recognition
accuracy for most fonts are now common. Some systems are even capable of
reproducing output that closely approximates the original scanned page including
images, column and other non textual components.
1.1.2 History of OCR
In 1929 Gustav Tauschek obtained a patent on OCR in Germany, followed by
handle who obtained a US pattern on OCR in USA in 1933 (U.S. Patent 1,915,993). In
1935 Tauschek was also granted a US patent on his method (U.S. Patent 2,026,329).
Tauschek‘s machine was a mechanical device that used templates. A photo
detector was placed so that when the template and the character to be recognized were
line up for an exact match and a light was directed towards them, no light would reach
the photo detector.
In 1950, David H. Shepard, a cryptanalyst at the Armed Forces Security Agency
in the United State, was asked by frank Rowlett who had broken the Japanese PURPLE
diplomatic code, to work with Dr. Louis Tordella to recommend data automation
procedures for the Agency. This included the problem of converting printed messages
into machine language for computer processing. Shepard decided it must be possible to
build a machine to do this, and, with the help of Harvey Cook, a friend, built ―gismo‖ in
his attic during evenings and weekends. This was reported in the Washington Daily
News on 27 April 1951 and in the New York Times on 26 December 1953 after his U.S.
Patent 2,663,758 was issued. Shepard then founded Intelligent Machines Research
Corporation (IMR), which went on to deliver the world‘s first several OCR systems used
image analysis, as opposed to character matching, and could accept some font variation,
Gismo was limited to reasonably close vertical registration, whereas the following
3
commercial IMR scanners analyzed characters anywhere in the scanned field, a practical
necessity on real world documents.
The first commercial system was installed at the Readers Digest in 1955, which,
many years later, was donated by Readers Digest to the Smithsonian, where it was put
on display. The second system was sold to the Standard Oil Company of California for
reading credit card imprints for billing purposes, with many more systems sold to other
oil companies. Other system sold by IMR during the late 1950s included a bill stub
reader to the Ohio Bell Telephone Company and a page scanner to the United States Air
Force for reading and transmitting by teletype typewritten messages. IBM and others
were later licensed on Sheppard‘s OCR patents.
In about 1965 Readers Digest and RCA collaborated to build an OCR Document
reader designed to digest the serial numbers on Reader Digest coupons returned from
advertisements. The fonts used on the documents were printed by an RCA Drum printer
using the OCR-A font. The reader was connected directly to an RCA 301 computer (one
f the first solid state computers). The reader was followed by a specialized document
reader installed at TWA where the reader processed Airline Ticket stock(a task made
more difficult by the carbonized backing on the ticket stock). The readers processed
document at a rate of 1500 documents per minute and checked each document rejecting
those it was not able to process correctly. The product became part of the RCA product
line as a reader designed to process ―Turn around Documents‖ such as those Utility and
insurance bills returned with payments.
The United States Postal Service has been using OCR machines to sort mail
since 1965 based on technology devised primarily by the prolific inventor Jacob
Rabinow. The first use of OCR in Europe was by the British General Post office or
GPO. In 1965 it began planning an entire banking system, the national Gyro, using OCR
technology, a process that revolutionized bill payment systems in the UK. Canada Post
has been using OCR systems since 1971. OCR systems read the name and address of the
addressee at the first mechanized sorting center, and print a routing bar code on the
envelope based on the postal code. After that the letters need only be sorted at later
centers by less expensive sorters which need only read the bar code. To avoid
4
interference with the human-readable address field which can be located anywhere on
the letter, special ink used that is clearly visible under ultra violate light. This ink looks
orange in normal lighting conditions. Envelopes marked with the machine readable bar
code may then be processed.
In 1974, Ray Kurzweil started the company Kurzweil Computer Products, Inc.
and led development of the first Omni-font optical character recognition system—a
computer program capable of recognizing text printed in any normal font. He decided
that the best application of the technology would be to create a reading machine for the
blind, which would allow blind people to understand written text by having a computer
read it to them out loud. However, this device required the invention to two enabling
technologies—the CCD flatbed scanner and the text-to- speech synthesizer. On January
13, 1976, the finished product was unveiled during a widely reported news conference
headed by Kurzweil and the leaders of the National Federation of the blind. Called the
Kurzweil Reading Machine, the device covered an entire tabletop, but functioned
exactly as intended. On the day of the machine‘s unveiling, Walter Cronkite used the
machine to give his signature sound off, ―And that‘s the way it was, January 13, 1976.‖
While listening to The Today Show, musician Stevie Wonder heard a demonstration of
the device and personally purchased the first production version of the Kurzweil
Reading Machine.
In 1978 Kurzweil Computer Products began selling a commercial version of the
optical character recognition computer program. LexisNexis was one of the first
customers, and bought the program to upload paper legal and news documents onto its
nascent online databases. Two years later, Kurzweil sold his company to Xerox, which
had an interest in further commercializing paper-to-computer text conversion. Kurzweil
Computer Products thus became a subsidiary of Xerox known as Scan soft (now
Nuance).
5
1.1.3 Problem of OCR
OCR of textual documents in general involves the following problems.
i) Image acquisition
ii) Text line extraction from document images
iii) Word segmentation and character segmentation
iv) Character recognition and word recognition
Optical scanners attached with PCs are mostly used for capturing digital images
document images. Extraction of text lines from document images is a trivial problem
provided that document image remains unskewed. Text line from such document images
can be easily extracted by identified valleys of horizontal pixel density histograms of
these images. But for all practical situations, document images are skewed at least to
some extent and the said technique fails to work for these images. Many text lines may
touch each other. Skewness is inherent in handwritten text. So, special techniques are
required for character segmentation of handwritten Bangla word images.
Segmentation of isolated word images, extracted from optically scanned
document images of handwritten text, is one of the major problems of optical character
recognition (OCR). If we can find a better method for segmenting the handwritten words
into characters then we can increase our recognition of characters too. So segmentation
of words into characters makes a large contribution towards the overall performance of
OCR system character recognition also towards the overall performance of an OCR
system too.
Characters segmented from document image are to be recognized for coding
them in ASCII or some other standard character code. For any of the widely used non-
holistic optical character recognition (OCR) approaches, success of a specific technique
depends on how best a word can be segmented into pieces, which are to be considered
subsequently as candidates for its constituent characters. The better is the segmentation,
the lesser is the ambiguity encountered in recognition of candidate characters or word
pieces. To recognize a candidate character, its context also requires due consideration.
6
Because of variation of shapes and sizes, character segmentation of handwritten Bangla
word images requires more sophisticated technique than that of printed characters.
1.1.4 Recent Trends in OCR Research
Research on OCR has been mostly found to concentrate on text of European
languages based on Roman alphabet. Possibly the probability of European languages in
the industrialized West has interested both the researchers and entrepreneurs in OCR of
text of European languages including English text. Scripts relating to Asian languages
like Chinese, Korean, Japanese and Arabic have also received considerable attention
from the researchers working in the field of OCR. Other that these, a number of Indian
scripts, viz, Devnagri, Oriya and Bangla, have started to receive attention for OCR
related research in the recent years. Out of these, Bangla is the second most popular
script and language in the Indian subcontinent. As a script, it is used for Bangla,
Assamees and Manipuri languages. Bangla, which is also the national language of
Bangladesh, is the fifth most popular language in the world. So is the importance of
Bangla both as a script and as a language. But evidences of research on OCR of
handwritten Bangla characters, as observed in the literature, are a few in numbers.
1.2 Characteristics of Bangla script
Characters of Bangla script can be grouped into five categories of characters,
viz., vowel, consonant, modified shape, compound character, and punctuation symbol.
Out of these characters, vowels and consonants, which constitute Bangla alphabet, are
called basic characters. There are 11 vowels and 39 consonants in Bangla alphabet.
There is no concept of upper and lower case characters in Bangla script. Characters in
Bangla script are written from left to right. A vowel following a consonant in a word
takes a modified shape in Bangla script. Such shapes of all vowels are termed as
modified shapes. It is noteworthy that some modified shapes attached with a consonant
7
have two isolated parts appearing in two opposite sides of the consonant. Some modified
shapes may appear just below the consonant, and some may reach its top from one of its
sides with a curved or partly curved segment. So, characters in Bangla script may not
always appear in non-overlapping consecutive positions. Depending on the mode of
pronunciation, a Bangla consonant followed by one or two consonants takes a complex
shape, which is called a compound character. There are in all 280 compound characters
in Bangla script. Apart from the basic characters, the modified shapes, and the
compound characters, Bangla script also constitutes 10 digit patterns. An important
feature of Bangla characters is Matra or head line. Excluding a few, all basic and
compound characters of Bangla script has this feature. The width of a Matra is nearly
same as the width of the character it touches. All the Matras of consecutive characters
appearing in a Bangla word are joined to form a common Matra of the characters
appearing in the word.
Fig.1 Bangla alphabet basic shape (The first 11 characters are vowels while others are
consonants.)
8
(a)
(b)
(c)
Fig. 2 Examples of vowel and consonant modifiers: (a) vowel modifiers, (b) exceptional
cases of vowel modifiers and (c) consonant modifiers
9
1.3 Character segmentation and ground truthing
1.3.1 What is character segmentation?
Character segmentation is a necessary preprocessing step for character
recognition in many handwritten word recognition systems. The most difficult case in
character segmentation is the cursive script. Fully cursive nature of Bangla handwriting
poses some high challenges for automatic character segmentation. Character
segmentation techniques are mostly script dependent. It is not only because of variations
of character shapes from one alphabet to other but also for certain script specific features
of text document. Segmentation of isolated words into constituent characters is a
challenging problem for Bangla scripts. Appearance of consecutive characters in
overlapping column positions over a text line makes the problem of Bangla word
segmentation more complex compared to segmentation of English words. The problem
becomes compounded with handwritten Bangla words because of variation in sizes and
shapes of handwritten characters. Considering all this, a novel technique for segmenting
images of handwritten Bangla words is presented in this paper.
Before segmenting individual characters of each Bangla word in the text image,
the word is horizontally partitioned into three adjacent zones as shown in Fig. 3. The
portion of each word on and above the Matra constitutes ‗upper zone‘, the main body of
the characters in a word lies within the ‗middle zone‘, and the portion of the word,
containing especially modified shapes and period like isolated character components,
below the main body form the ‗lower zone‘. The technique of word segmentation is
based on detection of the Matra.
10
Fig. 3 Illustration of three zones and region boundaries of a Bangla word
A Matra is a horizontal line, which passes touching the upper part of many
characters of Bangla script as shown in Fig. 4(a). Depending on the characters, it covers
at most the entire character width. The consecutive characters, in a Bangla word, which
have Matras, are joined through a common Matra formed by joining the Matras of
individual characters as shown in Fig. 4(b). This line may have some discontinuity over
the positions where the characters in the word appear without Matras
In handwritten Bangla words, the Matras are not horizontal as strictly as these
are in printed words. So the technique of removing the Matra of a word for segmenting
its constituent characters may leave many characters joined with each other. Such under
segmentation may complicate classification decisions in the subsequent stage.
How to segment handwritten Bangla words into constituent characters efficiently
is still a challenging problem of OCR related research. This is a major point of
motivation behind the present work that deals with the problem of segmenting hand
written Bangla words into constituent characters.
11
Fig. 4(a) An illustration of the common Matra of a word
Fig. 4(b) An illustration of Matra of individual characters in a word
In image analysis testing of any algorithm is time and man-power consuming in
manual way, which is now a days are widely used in different corner of world. Even the
testing schema is not standard. Different organization uses different testing schema. So
the success rate varies. Even standard database availability is too poor. So the result
generation to a particular technique becomes hectic for researcher as they need to collect
or prepare database for their won job.
1.3.2 What is ground truthing?
Generations of appropriate ground truth data has always been a challenging and
time some task for the kind of problem under consideration. Availability of ground truth
12
information, however, makes any database more useful, enabling proper evaluation of
one‘s technique by comparing their output with the ground truth of the same. In the
present work, we have prepared ground truth images for a subset of our database ,viz.,
CMATERg1.1.1 and CMATERg1.2.1 respectively. We have prepared these ground
truth images of the databases in a semi automatic way. More specifically, we have
employed our previously developed technique [9] to identify individual character
segments from any document image. The possible error that might have been generated
in the automated character segmentation is corrected using a software tool called GT
Gen version 1.1, which we have developed for this project. Basically, we have used GT
Gen to recolor the characters, which were erroneously labeled by our previously
developed technique [9]. It may be noted that all the ground truth images are stored in
bitmap (bmp) file format, where the background is labeled in white and individual
characters are marked in different colors.
1.3.3 Importance of handwritten Bangla word
Bangla is an important East Asian script widely used in India and Bangladesh.
Popularity wise, Bangla ranks 5th in the world and 2nd ranked in India as a script and a
language both. It is also the national language of Bangladesh.
Handwritten Bangla word is cursive in nature in most of the cases. So
identification of each character is difficult to any segmentation algorithm. In handwritten
Bangla words, the Matras are not horizontal as strictly as these are in printed words. So
the technique of removing the Matra of a word for segmenting its constituent characters
may leave many characters joined with each other. Such under segmentation may
complicate classification decisions in the subsequent stage.
13
Chapter 2
Review of Existing Work
In this chapter about the previous work and their drawbacks on character
segmentation from handwritten Bangla word images.
2.1 Problems of Character Segmentation from
handwritten Bangla word images
Character identification is the first and most important step in the process of
OCR of document images. If the characters are not identified accurately and for example
two or more characters are connected with common Matra line then none of the
characters of the word can be recognized correctly. The same problem occurs if a
character is accidentally split into two or more parts. The characters might be written so
close to one another those accents and similar features may become difficult to classify
into the correct character. Adjacent characters might even touch one another at some
points and in those cases it becomes very difficult to identify the constituent characters,
which have joined to form a single component. Character segmentation of handwritten
Bangla word images is faced by many challenges depends on the style of writing of an
individual. In image analysis testing of any algorithm is time and man-power consuming
in manual way, which is now a days are widely used in different corner of world. Even
the testing schema is not standard. Different organization uses different testing schema.
So the success rate varies. The present work suggests a method based on comparison of
neighborhood connected or disconnected components to determine whether they belong
to the same character.
14
2.2 Some recent character segmentation and ground-
truthing methodologies
A wide variety of text line detection methods for handwritten Bangla word
images have been reported in the literature. These methods may be categorized into four
types, namely (i) a fuzzy technique for segmentation ; (ii) a two stage approach for
segmentation; (iii) a database for unconstrained handwritten Bangla word images; (iv) a
complete handwritten numeral database , which cannot be grouped in a unique category
since they do not share a common guideline.
2.2.1 A fuzzy technique for character segmentation
A fuzzy technique for segmentation of handwritten Bangla word images have
been presented in work [1]. It works in two steps. In first step, the black pixels
constituting the Matra (i.e., the longest horizontal line joining the tops of individual
characters of a Bangla word) in the target word image is identified by using a fuzzy
feature. In second step, some of the black pixels on the Matra are identified as segment
points (i.e., the points through which the word is to be segmented) by using three fuzzy
features. On experimentation with a set of 210 samples of handwritten Bangla words,
collected from different sources, the average success rate of the technique is shown to be
95.32%. Apart from certain limitations, the technique can be considered as a significant
step towards the development of a full-fledged Bangla OCR system, especially for
handwritten documents.
2.2.2 A two stage approach for segmentation
Segmentation of handwritten Bangla word images is a challenging problem for
the researchers. Discontinuity or absence of Matra, an important feature of Bangla
15
script, may lead to inherent segmentation within the word images. Around 55% of these
inherently segmented connected sub-images do not require further segmentation. They
have designed a novel two-stage approach for segmentation of isolated Bangla word
images. In the first stage, a feature based approach is design to classify the connected
word segments into either of the two classes ,‘Segment further‘ and ‗Do not Segment‘
using a multi-layer perception (MLP) based classifier. In the second stage, fuzzy
segmentation feature are design to identify the Matra region and the potential
segmentation point on the Matra of the connected word segments that belong to
‗Segment further‘ class. Using this technique, the overall successful segmentation
accuracy achieved after two stages is 95.87% in the work [2].
2.2.3 A database for unconstrained handwritten Bangla word
images
In the work [7], the preparation of a benchmark database for research on off-line
Optical Character Recognition (OCR) of document images of handwritten Bangla text
and Bangla text mixed with English words have been described. This is the first
handwritten database in this area, as mentioned above, available as an open source
document. As India is a multi-lingual country and has a colonial past, so multi-script
document pages are very much common. The database contains 150 handwritten
document pages, among which 100 pages are written purely in Bangla script and rests of
the 50 pages are written in Bangla text mixed with English words. This database for off-
line-handwritten scripts is collected from different data sources. After collecting the
document pages, all the documents have been preprocessed and distributed into two
groups, i.e., CMATERdb1.1.1, containing document pages written in Bangla script only,
and CMATERdb1.2.1, containing document pages written in Bangla text mixed with
English words. Finally, we have also provided the useful ground truth images for the
line segmentation purpose. To generate the ground truth images, we have first labeled
each line in a document page automatically by applying one of our previously developed
line extraction techniques and then corrected any possible error by using our developed
16
tool GT Gen 1.1. Line extraction accuracies of 90.6 and 92.38% are achieved on the two
databases, respectively, using our algorithm. Both the databases along with the ground
truth annotations and the ground truth generating tool are available freely at
http://code.google.com/p/cmaterdb.
2.2.4 A complete handwritten numeral database
The paper [16] describes the ISI database of handwritten Bangla numerals.
Bangla is the second most popular language and script of the Indian subcontinent and it
is used by more than 200 million people all over the globe. The present database has
several components which include both on-line and off-line handwritten numerals.
Samples of numeral strings and isolated numerals have been collected under both modes
of writing. This database has been developed at the Computer Vision and Pattern
Recognition Unit laboratory of Indian Statistical Institute, Kolkata. Samples of the
present database are properly ground thruthed and subdivided into respective training
and test sets. The off-line sample images are stored in TIFF image format and the on-
line samples are stored along with various information as header in ASCII file format.
This database will facilitate fruitful research on handwriting recognition of Bangla
through free access to the researchers.
Other methodologies are include in the works described in [12-14]. In [5], the
character segmentation problem is seen from an artificial intelligence perspective.
2.3 Motivation
Considering the all kind of problems as discussed above, we actually need an
automated evaluation tool for OCR systems, which is comparing the segmented results
of a technique/algorithm with ground thruthed images. The evaluation technique is
constructed of the following steps. First, database of word images is prepared. Then
17
apply segmentation technique on that database. The characters of the word images are
not segmented properly. So we need detection and correction of the errors manually.
After correction manually we store the word images in the database as ground truthed
images. Our aim is to compare the segmented word images with the corresponding
ground-truthed images automatically and also will give the success rate. For this purpose
we want to create a tool in future. It will save time and man-power and will minimize
analytical errors. Ground-truth preparation plays an important role in image analysis as
mentioned above. It is also found that there is no such standard database or automatic
evaluating tool for handwritten Bangla word images for handwritten Bangla OCR
system. So, in present work, ground-truthing for handwritten Bangla word-images in
two levels is introduced.
Ground-truthed images are generated for the said database for evaluation of
character segmentation algorithm. Character segmentation accuracy on these
handwritten word images is also reported in the current work. Ground-truthed images
are prepared for component level and character level so that the database would also be
very useful for the performance evaluation of character recognition system.
18
Chapter 3
Present Work
The present work on Character segmentation and ground-truthed
preparation for handwritten Bangla word images is described below. A typical OCR
system consists of scanning, preprocessing, word and character separation, recognition
and post processing stages. Each stage has an impact on overall performance. As India is
a multi-lingual country and has a colonial past, so multi- scripted document pages are
very much common. The database contains 5000 handwritten word images written
purely in Bangla script.
The document of offline-handwritten scripts is collected from different data
sources. After collecting the document pages, the entire document has been pre-
processed. Each document page contains 180-200 words on an average respectively.
Finally we have also provided the useful ground-truthed images for the character
segmentation purpose. To generate the ground-truthed images, at first we have labeled
each component in a word images with unique color applying our previously developed
technique [ ] and then corrected any possible error manually, developed for this project.
The database would be very useful for handwritten OCR research in the area of Bangla
especially for the performance evaluation of character segmentation methodologies as
there is hardly any standard database found for the handwritten Bangla word images.
Currently database is available on www. Cmaterju.org. Our aim is to provide the
ground-truthed images for component level and character level segmentation of Bangla
handwritten character recognition system.
The name of the prepared database is CMATERdb1.1.1, where CMATER stands
for Center of Microprocessor Application for Training Education and Research, a
research laboratory at Computer Science and Engineering department of Jadavpur
19
University, India, where the current research activity took place. Db stands for database
and the numeric value.
Here the overall work flow is shown in fig. 5. The implementation details of the
fig. 5 are discussed in the following sections. My present work is highlighted on this
flow diagram.
Fig.5 The basic flow diagram of the overall project
20
3.1 Data Collection Methodologies
The materials of the handwritten document pages for the proposed database have
been collected from different types of sources, viz., class-notes of students of different
age group, handwritten, handwritten manuscript of a popular Bangla monthly magazine
―Computer Jagat‖ and from document pages written by different person , on request,
under our supervision.
The document pages written under our supervision were collected from various
persons with subject varying from news paper articles and Bangla text books containing
both Bangla and English vocabulary. The writers were asked to use a black or blue ink
pen and write inside the A-4 size pages. They imposed no other restrictions regarding
the kind of pen they used or the style of writing chosen. Special attention was paid to
ensure data collection from writers of different age groups and educational levels.
Moreover, the pages were collected from different places (home, office, school etc.) in
order to include different style of writing. In total 25 men and 15 women were
participated in the data collection drive. The main characteristics of our database are as
follows.
95% of the writers were native Bengali.
Places of data collection: in school/colleges,40% in writers‘ homes, and
20% in public places.
Educational level of the writers: 20% 10th
standard school, 40% general
high school and 40% college and university.
Writers‘ age: 40% between 15-25 years, 30% between 25-35, and 30%
between 35-55 years.
3.2 Segmentation
The current work designs a novel technique for identification of potential
segmentation points on the Matra to isolate constituent characters from the word image
21
of Bangla script. In the first stage, component-labeling algorithm of is applied to identify
connected sub-parts of the word images. The second stage involves an approach for
classification of the connected sub-parts into either of the SF or DNS classes using a rule
based prior selection and well-known MLP based classifier with a set of features
extracted from these components. Finally, fuzzy segmentation features are used to
identify potential segmentation points in an effective way on the detected fuzzy Matra
region for subsequent extraction of isolated characters or character sub-parts from the
overall word image. The basic steps of operations involved in this work for
segmentation of handwritten word images of Bangla script are depicted in Fig. 6.
Constituent characters or their sub-parts of words often extend above the
common Matra or appear below the main character body. In the current work, we have
identified three adjacent horizontally partitioned zones (viz., upper, middle and lower)
from each word image as shown in Fig. 1. More specifically, the top row of the upper
zone (R1), the top row of the middle zone (R2), the middle row of the middle zone (R3),
the bottom row of the middle zone (R4) and the bottom row of the lower zone (R5) are
identified from the word image. A horizontal pixel scan of the word image from top
towards bottom identifies the first row with any black pixel as the top row of the upper
zone i.e., R1. Similarly, a horizontal scan from bottom towards top identifies the first
row with any black pixel as the bottom row of the lower zone i.e., R5. Identification of
the top and bottom row boundaries of the middle zone (a key decision for subsequent
features extraction) is a challenging task in handwritten word segmentation.
In this work, authors have scanned the whole image to calculate sum of all
maximum horizontal length for each row and then estimated R2 using those values. But
sometimes this may give us misleading information. It may so happen because there are
cases related to handwriting style of individual where the sum of maximum longest run
length may appear anywhere in the word image and due to which R2 is not estimated
correctly as shown in Fig. 7. Therefore we have modified the technique for
determination of R2.
22
Fig.6 Block diagram of basic steps of present work.
We know that generally Matra of handwritten word images do not appear in the
lower half of the image. So in the present work, to identify the common headline of the
word, horizontalness of each row is computed from the top to half of the word images
i.e. from R1 to R1+(R5- R1)/2. Each black pixel of the word image in the said region is
replaced by the length of the longest run of black pixels in horizontal direction by itself.
Sum of the horizontal longest run values of all the pixels in a row is computed for each
row of the word image. The row with the highest sum represents the row with maximum
horizontalness. This row signifies the possible upper boundary of the middle zone and
we have called it as 1st approximation of the upper boundary of the middle zone (R21).
Then from the vertical feature we have estimated the 2nd approximation of R2 and
called it R22.
Fig.7 Wrong estimation of R2 using technique described in [2]
23
But even after estimating R21 and R22 we have observed that in few cases, the
Matra regions are not estimated accurately (as shown in Fig. 6). To address this issue,
we have determined another R2 estimate as the row containing the longest single run of
black pixels and called it as R23. So finally we have taken the average of the three R2
approximations and called it as R2final, such that R2final = (R21+ R22 + R23)/3. This
new estimate of R2 (involving three approximations) is observed to be more accurate in
comparison to our prior works involving two such approximates. We have taken R2final
as our final upper boundary (R2) of middle zone for a handwritten word image.
Also we know that generally bottom row (i.e. R4) of handwritten word images do
not appear in the upper zone. So in our current work, to identify the R4 of the word
images, horizontal transition points between text and background pixels are computed
from the middle to bottom of the word images i.e. from (R1+(R5- R1)/2) to R5. In each
row, starting from the middle row to the bottom row of word image, the sum of
transition points between text pixel to background pixel and vice versa are computed.
The average number of transition points in the lower half of the image is computed as
eta (η). Now the 1st row with greater transition points than η from bottom row of lower
zone to half of the word image is identified as the bottom row of the middle zone (say,
R41). Again, as in case of R2, we have estimated R4 from the vertical feature and called
it R42. Then we have taken the average of R41 and R42 as the final R4 i.e. = R4final =
(R41+R42)/2. We have taken R4final as bottom row of lower zone i.e. R4. Finally, the
middle row of the middle zone is taken as R3 i.e. R3= (R2+R4)/2.
In the present work we have used a simple, yet popular, technique for identifying
the connected components within the word image. Identification of connected word
components requires identification of the connected pixels therein and marking them
with identical labels. For this the CCL algorithm [14] scans the word image pixel by
pixel from left to right and from top to bottom. During scanning, it considers all 8
neighbors of each pixel. For each of the connected components, all its member pixels
appearing in the sub-image are replaced by a single distinct symbol. This is done to
complete labeling of the connected pixels in the image and to generate uniquely coded
24
connected components as described in Fig. 8. Each of such connected components is
subsequently extracted for analysis.
3.2.1 Selection of SF and DNS Components
Among all the digitized word sub-parts generated after connected CCL, a
decision is often required to identify only the components that need further segmentation
because of the presence of many inherently segmented characters or their subparts in
word images (as shown in Fig. 9). Thus, all word-components may not require further
segmentation at all. These components are often classified into SF and DNS classes as
shown in Fig. 7. Segmentation of DNS components is an overhead as it causes over
segmentation of word components. So, selection of SF and DNS components not only
minimize the character isolation overhead but also minimize the over segmentation
probability. For this, we have developed here a two stage selection for SF and DNS class
components. These stages are described in subsections 3.2.1.1 and 3.2.1.2.
Fig. 8 A sample word image and its three of connected components
25
3.2.1.1 Initial Selection of Obvious SF and DNS Class Components
In work [3], MLP based schemes were used for such a classification problem.
However, consideration of all word sub-parts in the said classification algorithm not
only increases computational overhead, but also leads to ambiguities in the selection
leading to erroneous classification. To solve this problem, a pre-selection step is
developed in the present work that identifies obvious SF and DNS class components. In
the designed methodology two scale-invariant thresholds are used for this pre-selection
of obvious SF and DNS components prior to the MLP-based classification scheme.
In the current approach, all the word components are divided hypothetically into
pieces by using a separating line (horizontal) along the middle line of the region R2 to
R4. The row, along which this separating line is go through, is selected experimentally
form the sample word images of the database.
After this hypothetical separation, the number of connected sub-components or
pieces generated as a result of this division is counted. We have applied this number as
the decision maker i.e., based on the number of generated sub-components; the original
component is categorized into one of the two types of classes, viz.., DNS or SF. If the
number of sub-components in a component is less than a threshold value T1, then we
have considered the component as a member of DNS class. On the other hand, if the
same is greater than another threshold value T2 then the component is considered as
belonging to the SF class. Some of the sample components classified successfully using
this thresholding technique are shown in Fig. 9.
The components with number of sub-components (n) between T1 and T2, i.e.
T1<n<T2, are sent to a previously trained MLP classifier to accurately classify the
components. This is done so, as decision-making on these components is not possible by
using either T1 or T2. Experimentally, we have observed the values of T1 and T2 as 2
and 6 respectively.
26
Fig. 9 Sample images of Bangla script which are pre-classified as Obvious DNS components,
pre-classified as Obvious SF components and sent for MLP based classification (for subsequent
SF/DNS class identification)
From the images of Fig. 9, it is evident that the choice of T1 is suitable for single
character components partitioned in to two pieces along the hypothetical separating line,
and subsequently classified as DNS segments. Also, the choice of T2 is done in such a
way that, multiple touching characters or their sub-parts generate more number of
components than T2 after being hypothetically partitioned along the separating line.
These components are classified as SF components. In all remaining cases, ambiguities
may exist and thus need sophisticated techniques such as MLP based classifiers and
associated feature vectors.
3.2.1.2 Classification of SF/DNS Components using MLP
In the present work, an MLP based classifier is used for classification of
connected word components, which are not classified in the pre-processing stage, into
either of the two classes to decide whether the given component needs to be further
segmented or not, using the feature set mentioned in Table 1. The MLP based classifier
designed for this work is trained with the Back Propagation (BP) algorithm. It minimizes
the sum of the squared errors for the training samples by conducting a gradient descent
search in the weight space. The number of neurons in a hidden layer in the same is also
27
adjusted during its training. In the current methodology we have designed a new feature
set containing 11 statistical features, as described in Table 1. The following discussion
justifies the choices of respective feature descriptors.
The higher value of feature F1 signifies that the component may belong to DNS
class, as this component may have some part(s) in the upper zone of the word as shown
in Fig. 10(a). A similar explanation is applicable for the features F2 and F4 for the
components in middle zone and the middle-lower zone respectively and is illustrated in
Fig. 10(b). The feature F3 is used to classify the noise segment (i.e. broken part(s) of
Matra) which almost certainly appears partially in upper and/or middle zone as shown in
Fig. 10(c).
Table 1: Feature vector and their description
Feature
ID
Feature Description
F1 Percentage height (w.r.t. the overall component height)
of the component that appears upper zone of the word
image
F2 Percentage height (w.r.t. the overall component height)
of the component that appears middle zone of the word
image
F3 Percentage height (w.r.t. the overall component height)
of the component that appears lower of the middle zone
of the word image
F4 Percentage height (w.r.t. the overall component height)
of the component that appears lower zone of the word
image
F5 Maximum horizontalness of the component within the
region R2 to R4
F6 Area of the component within the region R2 to R4
F7 Number of data pixel of the component within the
region R2 to R4
28
F8 Number of data pixel of the component on R2
F9 Maximum width of the component within the region
R2 to R4
F10 Width of the component along R2
F11 Number of segmentation-point clusters on the Matra
region of the component
Feature F5, i.e. maximum horizontalness feature, has been used in the work [3].
However, due to writing styles of individuals this feature value may be higher in the
upper, lower or lower half of the middle zone if the ascendant (character sub-parts in the
upper-zone of the word image) or descendant (character sub-parts in the lower-zone of
the word image) is extended unnecessarily as shown in the Fig. 11(a). Because of this, in
the present work we have additionally used feature F9, i.e. maximum width of the
component within the region R2 to R4 as shown in Fig. 11(b). Lesser value of this
feature implies the component may be categorized as DNS class component.
In feature F6, as used in work [3], the whole component was considered for area
calculation. But this feature value may be higher for the component of DNS class due to
extended ascendant and/or descendant as shown in Fig. 11(c). For this reason, we have
modified the feature value of F6 by considering only the area of interest, i.e. the area
within the region R2 to R4 only. Due to the same reason, the feature value of F7 i.e.
number of data pixels is also calculated only within the region R2 to R4. Higher the
value of feature F7 more is the possibility of the component belonging to the class SF.
Similarly, high value feature F8 implies more prominent and continuity of the Matra i.e.
component will be a member of the SF class.
Again, due to cursive handwriting or discontinuity of Matra, the value of feature
F9 may be lower for the components that need to be segmented further. That is why we
have also taken feature F10 that gives the width of the component in the middle zone,
i.e. vertical projection of the components within R2 to R4 along R2 (central Matra row).
29
(a) Word component in upper zone inside color box
(b) Word components in middle and lower zone inside color boxes
(c) Noise component inside color box
Fig. 10(a-c) Illustration of feature F1, F2 F3 and F4
Feature F11 is the number of segmentation-point cluster. Often a component gets
segmented to generate multiple, close segmentation points on the Matra region
(Selection of Matra and segmentation point is discussed in sections 3.2.2 and 3.2.3).
Using 8-way connectivity, we have identified cluster of such segmentation regions and
the number such clusters is considered as a feature value. More is the number of
clusters, higher is chance of the component classified as SF class. In the previous work
[3], the number of segmentation-points was considered as feature. But more number of
segmentation-points may not always imply that the component needs to be segmented
further. This is illustrated in Fig. 12(a-b).
To compute feature F11, potential segmentation points in the region R1-R3 of
connected components are to be determined first. The technique for finding potential
segmentation points are discussed through section 3.2.2 and 3.2.3.
30
Fig. 11(a-c) Illustration of feature F5, F6 and F7
3.2.2 Determination of Matra Pixels using a Fuzzy
Membership Function and Horizontalness Feature for SF
components
The boundary between the sets of Matra pixels and non Matra pixels in the
region R1-R3 is not distinct in practice. The black pixels lying over the line R2 have got
strongest membership to the set of Matra pixels. As they are away on both sides of the
line R2, their degree of belongingness to the set diminishes, as shown in Fig. 13, through
a membership function MATRA (x). The exact expression of MATRA (x) is shown
below.
Where c=R2 denotes the center of the function, shown in Fig. 14, and ‗a‘ and ‗b‘ are
parameters of the equation. The value of ‗a‘ is chosen as |R1-R2|/2 for upper side of R2
and for lower side of R2 it is chosen as |R2-R3|/2. The value of ‗b‘ is chosen as 1.
To identify Matra pixels in the region R1- R3, all run lengths of black pixels
along each row in the region are computed. Taking average of all these run lengths in
the region, the mean run length of black pixels of the region are computed. This can be
31
considered as the mean horizontalness of all black horizontal line segments in the
region. Fig. 14 (a-b) shows three word images and the continuous horizontal line
segments of black pixels, whose lengths exceed the mean horizontalness of the
respective words. Candidates for Matra pixels are to be selected from such line
segments.
To finally determine whether a black pixel in region R1-R3 is a Matra pixel, the
product of the horizontalness [2] of the black line segments, on which the pixel lies, and
its MATRA value is computed. If the product exceeds the average value of all such
products for all black pixels in the region R1-R3 then the pixel is finally considered as
Matra pixel here. All such Matra pixels constitute the Matra region.
Fig. 12(a-b) More segmentation clusters in (a) but only one segmentation cluster in (b)
32
Fig. 13 The membership function for the set of Matra pixels
(a) Three sample word images (b) Consecutive black pixels, in the
sample images, whose horizontalness
exceed the mean horizontalness form
continuous lines highlighted with
darker shading
Fig.14 Illustration of horizontalness (h) feature
33
3.2.3 Determination of Potential Segmentation Points using
Two Fuzzy Membership Functions for SF components
Potential segmentation points are Matra pixels, across which the segment is to be
fragmented if it falls in the SF category. They are basically candidates for segmentation
points until classification of the segment in SF class is completed.
Potential segmentation points usually lie on the column positions along which
the values of black pixel count are less. The less is the value of the black pixel count
along the column position of a Matra pixel the higher is the degree of belongingness of
the pixel to the class of potential segmentation points. To simulate this, a membership,
1, as shown in Fig. 15(a), is introduced here. The equation to this function is
, for x≥0.
The values of parameters a, b, c are chosen as follows: c=0, b=1, a=WM, where WM is
the maximum vertical width of the Matra region, defined in Section 3.2.2.
To ascertain a Matra pixel as a potential segment point, its distance from the line R2 is
considered here. The less is the distance the higher is its degree of belongingness to the
set of potential segmentation points. Ideally, it would be on R2. On this basis, another
membership function 2, is introduced here. The function is shown in Fig. 14(b). The
values of the parameters of 2 are same as that of MATRA. To decide about whether a
Matra pixel in region R1-R3 would be a potential segmentation point, the average of 1
and 2 values are computed. If the average exceeds the mean of averages for all the
Matra pixels in the region R1-R3 then the said pixel is to be considered as a potential
segmentation point. Feature F11 here represents the total number of all such pixels
which are identified as potential segment points in the region R1-R3.
34
Fig. 15 (a-b) The membership functions for determination of potential segmentation points
3.2.4 Identification of Actual Segmentation Points in the SF
Components
For determination of actual segmentation points for SF components, there is
always a trade-off between under/over segmentation of word images. In the current
work, we have attempted to optimize between the two, with minimum loss of
information. The issue of segmentation also becomes difficult in the presence of
ascendants in the upper zone of the word component. For this reason, we have further
designed algorithm-steps to identify a single column for segmentation on the Matra
region. The methodology is described as follows.
Often a SF component gets segmented using the fuzzy features to generate
multiple, neighboring segmentation points along the Matra region. We have identified
segmentation cluster points using 8-neighbors CCL algorithm as illustrated in feature
F11 selection in section 3.2.1.2. It may be observed from Fig. 12(a-b) that actual
segmentation should not involve all the potential segmentation points in the cluster, but
35
focus on only the pixels that optimally separate the connected parts into different
characters (or their sub-parts) of the word image.
Selection of points which accurately segment the word components (sub-parts)
into their constituent characters or sub-parts is a challenging issue. In case of poor
selection of such points, over-segmentation may occur during the segmentation process.
As a result of these, characters of their sub-parts may be internally broken/segmented,
leading to loss of information.
In the light of the above facts, we have selected the actual (more accurate)
segmentation points from the segmentation points generated in each segmentation-
cluster in the current work. There are two primary decisions to be taken for this purpose,
firstly, selection of the row-boundaries for segmentation along specific columns on the
Matra region and secondly, identification of the segmentation columns in each
segmentation-cluster. The algorithmic steps involved in this process are given below:
1. Check whether there is any ascendant in the word component under
consideration. Estimation of the height of upper zone of the component does
the checking. If the height of the said zone is exceeding some adaptively
tuned threshold value (0.2*(R4 – R2)), then it can be said that component has
an ascendant part in the upper zone.
2. In either of the cases, the generated potential segmentation points are labeled
using 8-way CCL algorithm. Each cluster of segmentation points is labeled
uniquely. For each cluster, the following technique is applied to determine
the segmentation column along which we can segment the word component
under consideration:
A) If there is no ascendant in the word component under
consideration; calculate the sum of number of data pixels, Matra
pixels and segmentation-point pixels for each column in the
region from R1 to (R3- R2)/2. Otherwise, calculate the same for
each column in the region from (R2- R1)/2 to (R2 + (R3- R2)/2).
36
B) Consider the column for segmentation within the estimated region
(row boundaries), which has the minimum sum, as calculated in
step A.
Once the word components are segmented into constituent character or their sub-
parts, again 8-way CCL algorithm is applied to separate each such word-component.
Finally, such segmented components will be considered for recognition as meaningful
character codes.
3.3 Preparation Ground-truthed images
After scanning, the document were binarized by global thresholding technique,
where the threshold was chosen as the mean of the maximum and minimum gray level
values in each document images. All the binarized images were archived in DAT
format, where the foreground and background pixels were represented as 0 and
1respectively. Then the documents were preprocessed in order to remove all the
remaining salt and pepper noises like long lines in the border zone(s). Then
segmentation techniques described in section 3.2 is applied on the word images and
consequently gets the colored images. After getting the segmented components, error
detection and correction is required as all the characters are not segmented properly by
the segmentation technique. Detecting the errors, correct them manually and prepare
the character and component level ground-truthed database. The basic steps of this
work are represented by the fig. 16.
Using segmentation technique used in section 3.2, we get isolated characters or
their subparts. In this stage, the components are reconstructed as a word image
depending upon their position in the original word image. These word images are
37
Fig. 16 Basic steps of generating Ground-truthed images
labeled as distinct color for distinct one assigned and consequently we get the colored
image with segmentation effect as shown in table 2. But these word images are not
segmented properly. So we need detection and correction the errors manually and
consequently we have to prepare ground truth images.
38
Preparation of Ground-truthed images work in two levels. In first level, each
character in the word image is identified and then separated from each other. Then each
component either connected or disconnected of a character in the word image has
different color. So the first level‘s work is called component level segmentation. In
second level, each character may have two or more subparts either connected or
disconnected but they contain the same color as they are the components of a single
character. So the second level‘s work is called character level segmentation. For the
purpose of error detection and correction the tool ‗Paint‘ is used. ‗Paint ‗reads word
images with white background. We can select any color from the color box and use that
to recolor the characters which are not segmented properly in the word image by
selecting the intended segment point with the pencil. Using this technique, we can easily
correct errors in our segmentation algorithm to generate ground truth data. A screenshot
of the tool Paint with a word image is shown in fig. 17. The algorithmic representation
for estimation of this method is given below:
Steps:
1. Open an word image with the tool Paint
2. Pick any color from color box and then select the intended segment point with
pencil.
3. The character which we intended to segment is filling with color.
4. This will be done until all the characters are segmented.
5. Close the window and save the image.
Fig. 17 A screenshot of the toll Paint with an word image
39
Table 2: Some results after segmentation
In ground-truthed database generation we work in two levels. In first level, each
character in the word image is identified and then separated from each other. Then each
component either connected or disconnected of a character in the word image has
different color. So the first level‘s work is called component level segmentation. In
second level, each character may have two or more subparts either connected or
disconnected but they contain the same color as they are the components of a single
character. So the second level‘s work is called character recognition.
Among all the digitized word sub-parts generated after connected component
label-ling, a decision is often required to identify only the components that need further
segmentation because of the presence of many inherently segmented characters or their
subparts in word images (as shown in Fig. 19(a) ). Thus, all word-components may not
40
require further segmentation at all. These components are often classified into SF and
DNS classes as shown in Fig. 19. To Segment DNS word components is an over-head
for isolation of character components and also may causes over segmentation of word
components. So, Selection of SF and DNS components not only minimize the character
isolation overhead but also minimize the over segmentation probability. For this we
have developed here a two stage selection for SF and DNS class components.
In the table 2, the figure 01 is segmented correctly by the work [2]. So we do not
require any change. But in figure 02, two consecutive characters are connected to each
other after segmentation. This type of segmentation is called under segmentation. We
need to separate these characters by two distinct colors. To separate these characters
identify the intended segment point. Then the character having two parts one is in
middle zone and another is in upper zone also requiring segmentation. This is shown in
fig. 18(a) step by step. In second level, two subparts of a character either connected or
disconnected have the same color as they are parts of the same character. In table 2 the
fig. 03 has character containing two or more colors after segmentation. this type of
segmented word images are called over segmented. the character level and component
level segmentation of the over segmented word images is shown in fig. 18(b).
Collecting the component and character level Ground-truthed word images, we
create a database which is shown in the table 3. The results after segmentation are
comparing with the corresponding Ground-truthed images automatically by the propose
tool and will get the success rate.
41
Component level segmentation Character level segmentation
Fig. 18 (a) Illustration of Component level and Character level segmentation of the under
segmented word image
42
Fig. 18 (b) Illustration of Component level and Character level segmentation of the over
segmented word image
Table 3: Character and component level ground-truthed database:
Sl.
#
Original gray level
Bangla word Images
Corresponding Character
level Ground-truthed
images
Corresponding Component
level Ground-truthed
images
01
02
43
03
04
05
06
07
08
09
44
10
11
12
13
14
15
16
45
17
18
19
20
21
22
23
46
24
25
26
27
28
29
30
47
31
32
33
34
35
36
37
48
38
39
40
41
42
43
44
49
Chapter 4
Conclusion
Character segmentation is a vital step for an OCR system because the more is the
accuracy of segmentation; the less will be the error in recognition. The work presented
here provides a practical solution to the problem on how best word images of
handwritten Bangla characters can be segmented into constituent characters. Moreover,
the technique can segment the word having discontinuity in Matra or cursive Matra. It
also optimizes trade-off between under/over segmentation as Matra and segmentation-
point clusters are estimated correctly. As a result, better word segmentation accuracy
achieved with minimal data loss.
This character segmentation methodology could be successfully applied on the
other Matra-based scripts, viz., Devanagri, Gurmukhi etc. However, there are further
scopes of improvements of the present technique. An iterative implementation of the
present technique, along with the existing segmentation algorithm, or designing more
precious feature set for MLP may further improve the overall segmentation performance
of handwritten Bangla word images in future. By varying the classifier or combining the
results of the different classifiers, the improvement of the present technique is also
possible. The work as a whole can be considered as a significant contribution towards
the development of a yet to come Optical Character Recognition (OCR) system for
handwritten Bangla text document.
In future, our aim is to increase the size of the database. This technique may
significantly reduce the cases of under-segmentation. However, there are further scopes
of improvements. An iterative implementation of the present technique, along with the
existing segmentation algorithm, may further improve the overall segmentation
performance of handwritten Bangla word images in future.
50
References
[1] ―A fuzzy technique for segmentation of handwritten bangle word images‖,
Subhadip Basu, Ram Sarkar , Nibaran Das , Mahantapas Kundu ,Mita Nasipuri ,
Dipak Kumar Basu .
[2] ―A two stage approach for segmentation of handwritten bangle word images‖,
Ram Sarkar, Nibaran Das, Subhadip Basu, Mahantapas Kundu, Mita Nasipuri, Dipak
Kumar Basu .
[3] ―An improved offline hand written character segmentation algorithm for bangla
script‖, Subhadip Basu, Nibaran Das, Mahantapas Kundu, Mita Nasipuri, Dipak Kumar
Basu.
[4] ―Development of a recognizer for Bangla text: present status and future
challenges‖, Saima Hossain, Nasreen Akter, Hasan Sarwar and Chowdhury Mofizur
Rahman.
[5] ―Character segmentation for handwritten bangle words using artificial network‖,
T. K. Bhowmik, A. Roy, U. Roy.
[6] ―Individual character segmentation from single stroke of bangle online handwritten
text‖, Nilanjana Bhattacharya, Umapada Pal, Koushik Roy.
[7] ― A database of unconstrained handwritten bangla-english mixed script document
image‖, Ram Sarkar, Nibaran Das, Subhadip Basu, Mahaantapas Kundu, Mita Nasipuri,
Dipak Kumar Basu.
[8] ―A script independent technique for extraction of characters from handwritten word
images‖, Ram Sarkar, Samir Malakar, Nibaran Das, Shubhadip Basu, Mita Nasipuri.
[9] Digital Image Processing, http://en.wikipedia.org/wiki/Pixel
[10] Digital Image Processing, http//en.wikipedia.org/wiki/Grayscale
[11] http://www.ijcaonline.org/journal/number23/pxc387693.pdf
51
[12] http://www.mathworks.in/help/toolbox/images/f18-12508.html
[13] M.Maragoudisakis, et.al, ―Improving handwritten character segmentation by
incorporating Bayesian knowledge with support vector machines,‖ in Proc.
ICASSP‟ 2002, vol. 4, pp. IV-4174.
[14] R.M. Bozinovic et.al. ―Off-line Cursive Script Word Recognition‖, IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 11,pp 68-83, 1989.
[15] ―A complete handwritten numeral database of Bangla-a major indic script‖,
B.B.Chaudhuri.