COMPUTER DRAFTING HAND-DRAWN LINE...
Transcript of COMPUTER DRAFTING HAND-DRAWN LINE...
COMPUTER DRAFTING OF HAND-DRAWN LINE SKETCHES
A THESIS
submitted for the award of the degree
of MASTER OF SCIENCE
in
COMPUTER SCIENCE AND ENGINEERING (by Research)
SHRIRAM REVANKAR
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY MADRAS1600 036
AUGUST 1987
TO Anna and Aayi
ACKNOWLEDGEMENT
With a deep sense of gratitude I acknowledge the
encouragement and guidance of Dr.B.Yegnanarayana, without which
this work would not have taken the present form. It is my
pleasure to thank Pramod Saini and Swaminathan who spent their
precious time in helping me during many stages of the thesis
work. I also thank Maria Dassou, Jai Kumar and my fellow research
scholars for their enthusiastic support.
I thankfully acknowledge the facilities provided by the
Information Sciences Laboratory. I thank Dr.Manohar and other
faculty for their help and guidance.
Shriram Revankar
CERTIFICATE
This is to certify that the dissertation entitled
"COMPUTER DRAFTING OF HAND-DRAWN LINE SKETCHES" is a
bonafide work of Revankar Shriram Venkatesh Shet,
carried out at the Department of Computer Science and
Engineering, Indian Institute of Technology, Madras,
for the award of degree of MASTER OF SCIENCE in
Computer Science and Engineering.
Research Guide, Professor and Head, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, 600 036, INDIA.
ASSTRACT
T h i s is an a t t e m p t t o e x p l o r e t h e i s s u e s i nvo lved i-n
computer d r a f t i n g o f hand-drawn l i n e s k e t c h e s . I n t h e env i saged
sys tem, a l i n e s k e t c h i s i n p u t th rough a v i d e o camera i n t e r f a c e .
The r e s u l t a n t g r a y t o n e image is b i n a r i z e d and from t h e b i n a r y
image, s k e t c h f e a t u r e s a r e e x t r a c t e d . These f e a t u r e s a r e t h e n
r e c o g n i z e d and c o r r e c t e d . The o u t p u t o f t h e sys tem i s i n t h e form
of c o o r d i n a t e s o f t h e f e a t u r e p o i n t s and t h e i r c o n n e c t i v i t y
in fo rmat i .on . T h i s form is compact and can be e a s i l y s t o r e d ,
upda ted , d u p l i c a t e d o r communicated o v e r a wide a r e a network.
The l o c a l t h r e s h o l d s e s t i m a t e d i n t h e proposed b i n a r i z a t i o n
scheme e x h i b i t good n o i s e r e j e c t i o n p r o p e r t i e s . The scheme a l s o
e x t r a c t s l o c a l l i n e v e c t o r s a t a l l edge p o i n t s . Run t r a c i n g i s
used f o r e x t r a c t i o n o f l i n e s from a b i n a r y image. The r u n l e n g t h
and l i n e w id th i n f o r m a t i o n i s used f o r e f f i c i e n t e x t r a c t i o n o f
t h e l i n e s . To r e c o g n i z e and c o r r e c t t h e i n p u t s k e t c h , t h e
e x t r a c t e d f e a t u r e s a r e r e p r e s e n t e d i n t h e form o f a g raph . Some
t e c h n i q u e s f o r g r aph r e d u c t i o n and c l a s s i f i c a t i o n a r e developed.
The r e c o g n i t i o n p r o c e s s v iews t h i s g raph a s a r e l a t i o n a l
d a t a b a s e , which answers q u e r i e s p e r t a i n i n g t o t h e s t o r e d
geome t r i c models. The geome t r i c models a r e d e f i n e d i n t h e form of
r u l e s w i t h a d a p t i v e t h r e s h o l d s . These models a l l o w approximate
matching. The c o r r e c t i o n p r o c e s s upda t e s some nodes o f t h e g raph
i n accosdarice w i t h t h e c o n t e x t u a l l y c o n s i s t e n t i n . t e r p r e t a t i o n s .
COMPUTER DRAFTING OF HAND-DRAWN LINE SKETCHES
CONTENTS
1. INTRODUCTION
1.1 Problem of computer drafting of hand-drawn line sketches
1.1.1 Machine representation of an input line sketch
1.1.2 Recognition and correction of line features
1.2 Description of proposed drafting system
1.3 Approach to obtain the machine representation of line sketches
1.3.1 Binarization
1 . 3 . 2 Extraction of line features from a binary pattern
1.4 Approach to recognize and correct a machine represented line sketch
1.4.1 Representation of features of line sketches
1.4.2 Recognition and correction
1.5 Work environment
1.6 Input and output specifications
1.7 Organization of the thesis
2. COMPUTER PROCESSING OF LINE SKETCHES: A REVIEW
2.1 Conventional computer aided drafting
2.2 Binarization of graytone images
2.3 Feature extraction from binary line patterns
2.4 Recognition of line patterns
2.5 Contributions of the present work
3. MACHINE REPRESENTATION OF LINE SKETCHES
3.1 Input and output specifications of the preprocessing stage
3.2 Binarization of graytone images
3.2.1 Proposed binarization scheme
3.2.2 Analysis of the proposed binarization scheme
3.3 Extraction of lines from a binary image
3.3.1 Assumed image model for line feature extraction from a binary image
3.3.2 Proposed feature extraction scheme
3.3.3 Results of the line extraction process
4 . RECOGNITION AND CORRECTION OF LINE SKETCHES
4.1 Graph representation of features of line sketch
4.1.1 Need for representation of features of a line sketch.
4.1.2 Suitability of graph representation
4.1.3 Reduction of graph representing a line sketch
4.1.3.1 Deletion of noisy segments
4.1.3.2 Restoration of intersection points
4.1.3.3 Deviation smoothing
4.1.4 Graph classificatioll
4.2 Approximate geometric models
4.2.1 Rules describing standard geometric models
4.2.2 Thresholds for approximation
4.2.3 Priority ordering of the rules
4.3 Context module for line sketches
4.3.1 Structure of the context module
4.4 Correction of hand-drawn line sketches
4.4.1 Selection of reference line segment
4.4.2 Correction of individual geometric figures
4.5 Control strategy for recognition and correction stage
4.6 Implementation of recognition and correction stage
4.7 Results
5. CQNCLUSIQN
6. REFERENCES
CHAPTER 1
INTRODUCTION
This research aims at making a machine intelligent enough to
accept and interpret a natural input in the form of hand-ara:~~n
line sketches. The output of the envisaged system is a drafted
version of the input. In the process the sketch is encoded in 3
compressed form which can be easily stored, updatec or
communicated over a cornputcr network. The thesis explores various
issues involved in this problem of computer drafting of hand-
drawn line sketches. In this chapter we give a brief account of
the problem, its background and our approach to solve it. The
organization of the thesis is given at the end of the chapter.
1.1 PROBLEM OF COMPUTER DRAFTING OF HAND-DRAWN LINE SKETCHES
A human being effortlessly recognizes a hand-draxn l i n e
sketch, because that mode of representation is compatible with
his understanding of line sketches. But the same representation
may not make any sense to a machine which has absolutely no
knowledge about line sketches. Secondly, the machine has 110
information about various symbols and figures which make sense to
a human being. Hence, if a machine has to draft an input line
sketch, it must be equipped with the necessary knowledge to
perceive and interpret the input. Thus the problem of computer
drafting comprises the following two major issues:
1.. to transform the input into a machine representation.
2. to provide necessary knowledge and control strategy to
interpret the encoded input.
1.1.1 Machine Representation of an Input Line Sketch
A line sketch drawn on a plane sheet of paper can either be
input automatically through devices like a video camera or can be
encoded manually. The process of distinguishing various features
like line intersections and deviations, which a human being does
while encoding manually, has to be done by the machine if a
sketch is input automatically. When a hand-drawn line sketch is
input through a video camera, a graytone image is obtained. This
modification can be attributed to spatial quantization, gray
level quantization, noise and nonlinearity of the transducer. The
absence of a perfectly white background or a perfectly dark
object further reduces the contrast between the object and the
background. Hence an attempt to obtain a machine representation
of the input line sketch must involve a scheme to distinguish
object pixels from background pixels unambiguously. Such a schen:e
is called binarization scheme. To obtain a machine
representation from a binarized image, a set of features are to
be extracted such that the set uniquely characterizes the input.
These features include lines, deviation points , intersection
points, etc.
1.1.2 Recognition and Correction of Line Features
Recognition of line features by a machine is achieved by
parametric matching. If the input sketches were perfect, there
would have been an exact matching of the features of the input
with the stored object models. But hand-drawn sketches exhibit
deviations from the true objects represented by then. Such
sketches can only be recognized through approximate matching. To
obtain their drafted version, they must be corrected in
accordance with the recognition made. Hence a machine to draft
suc? inputs must be equipped with geometric models for
approximate matching and the corresponding correction procedures.
The input sketches are normally a combination of various
geometric figures. So recognition of any geometric figure
invariably involves search. This necessitates proper
representation of the inrage data. The representation chosen must
preserve the invariant properties of the image, which may
otherwise be destroyed during correction.
In some sketches, context also plays an important role. For
example the sketch in Fig.l.1 can be viewed as either a pattern
of one quadrilateral within the other, or four triangles
connected at their vertices. Under such conditions, user can give
the context in which he wants the image to be corrected. So there
should be provision to comrnunicate contextual information to the
machine. This involve:; considerable extent of user interaction.
Fig.l.1 Illustration a f need for contextual in f ormat ion (The f igure can be viewed as one quadr i l a te ra l w i th in another or four t r i ang les connected a t t h e i r vert ices)
/ CONTEXT
1
REPRESENTATION
INE SIKETCtiES
COPY
Fig.1.2 An autamatic drafting system
1.2 DESCRIPTION OF THE PROPOSED DRAFTING SYSTEM
A schematic representation of the proposed drafting system
is shown in Fig.l.2. Input to the system is a hand-drawn line
sketch whose dimensions are not specified (undimensioned), as
shown in Fig.l.3a. The video camera and an analog to digital
converter module converts the visual data into the corresponding
digital representation, which is a graytone image. The
preprocessing stage includes binarization and feature extracticn
processes. The set of extracted features describes the input
image uniquely. Before these features are recognized and
corrected, they are represented in a suitable form, so that all
relations among the features are explicit. The recognition and
correction processes view this feature representation as a
relational data base, which answers queries pertaining to various
geometric models during recognition and gets updated during
correction. Knowledge about various geometric models and the
corresponding correction procedures are provided at the
recognition and correction stage. A context module is also
provided so that the user may provide the context in which the
drafted version has to be consistent.
Sample outputs at various stages of drafting are shown in
Fig.l.3b - 1.3d. Fig.l.3a is the input hand-drawn sketch.
1.3 APPROACH TO OBTAIN MACHINE REPRESENTATION OF LINE SKETCHES
1.3.1 Binarization
Binarization is basically a scheme to classify image pixels
into object or background pixels depending on whether or not the
a. INPUT HAND-DRAWN LINE SKETCH WITH AN INDICATOR LINE AT THE TOP LEFT CORNER.
c. PLOT OF THE FEATURES EXTRACTED FROM THE BINARY IMAGE, IN THE PREPROCESSING STAGE.
'b: BINARIZED IMAGE
Fig.1.3 A LINE SKETCH AT VARIOUS STAGES OF DRAFTING.
pixels are greater than a selected threshold. Some of the earlier
methods select global thresholds based on overall statistical
properties of an imagerl]. These methods produce poor results if
the input image is noisy or not uniformly illuminated, or if the
object area is small. Local thresholding schemes overcome some of
these drawbacks[2]. Some binarization schemes use both local and
global thresholds[3],[4].
A new local thresholding scheme is proposed which is
adaptive in nature and has noise suppression property. Here we
assume that da--ker the pixel, higher is its gray value. The
darkest point has a gray value of 255 and the brightest a value
of 0 . In this scheme a 3x3 window is considered around a pixel to
be thresholded. It is assumed that any line in the sketch is at
least three pixels wide. Therefore any line edge has one of the
eight 5-pixel neighborhood pattern in a 3x3 window as shown in
Fig.l.4. The minimum of the average gray values of the 5-pix~l
neighbors is used for the calculation of the thresh~ld for the
central pixel. The edge pat tern correspondi rig to tlie minimum-
average indicates the most likely background pattern, If the
pixel were to be on the edge of the line. The central piAAcl
belongs to the object if it has a gray value higher than the
minimum-average by a certain value.
If 'm' is the minimum of the average gray values of the set
of 5-pixel neighbors, the threshold is given by,
where K1 and K2 are positive nonzero constants, estimated either
experimentally or by image histogram analysis.
a . HORIZONTAL EDGES, (HI, HEIGHTAGE = i
b . POSITIVELY INCLINED EDGES. I WEIGHTAGE = 2
c . VERTICAL EDGES, I V ] , HEIGHTAGE = 3
d . NEGETIVELY INCLINED EDGES, 11-1, WEIGHTAGE = 4
F i g . l . 4 E i g h t 5 - P i x e l - N e i g h b o r p a t t e r n $ o f t h e p i x e l P, w i t h t h e i r v e c t o r w e i g h t a g e s .
8
A s can be seen from the threshold equation (I), the
threshold 'T' is high when the window is on the background (low
m ) and is low when the window is on the object (high ' v n ' ) .
This property of the threshold suppresses noise specks on the
background and holes on the object. In addition, the scheme
extracts the local line vectors at all edge points.
1.3.2 Extraction of Line Features from a Binary Pattern
Most of the conventional methods use thinning to obtain line
and intersection information from the binary pattern of a line
image[5]. These methods are time consuming because thinning is a
coniputationally expensive iterative process. To reduce the time
complexity, recent methods directly vectorize the binary
patterns[6]. But : this approach coniplicates the detection of
intersection points. A method to detect intersection points in
unthinned binary patterns is proposed by Sebock et al.[7].
A new method for line extraction, which exploits the line
width constraint, is proposed. Initially, the average line width
is found out by processing the indicator line segment drawn or1
the top left corner of an input sketch. Using this width
information, lines are grouped into two classes. The first is a
vertical or nearly vertical class of lines and the second is a
horizontal or nearly horizontal class of lines. All the binary
patterns representing vertical lines are traced by raster scan
and the rest are traced by non-raster scan.
Lines are viewed as overlapping run patterns[7]. To extract
lines from these patterns, we use run tracing. After every trace
of specific number of overlapping runs, a vector is placed on the
core line represented by the runs. If the vector deviates from
the line extracted so far by more than a prescribed threshold,
the exact point of deviation is found out using a binary search
technique. Line extracted up to the point of deviation is stored
as a line segment and the remaining part of the vector is taken
as a new line extracted so far. On the other hand, if the
deviation of the new vector is less than or equal to the
prescribed threshold, the vector is assumed to be a continuation
of the line extracted so far. This process continues until ail
lines are traced. The points where raster and non-raster traces
meet are points of intersection. These extracted features are
used for machine representation of the input line sketch.
1.4 APPROACH TO RECOGNIZE AND CORRECT A MACHINE REPRESENTED
LINE SKETCH
1.4.1 Representation of Features of Line Sketches
The extracted features of an input line sketch are to be
recognized and corrected. As the recognition process involves
search and pattern matching, the features must be suitably
represented. Wojcik[8] proposed one such representation in the
form of a graph. Nodes in the graph represent features of the
image and the arcs give the relation between the nodes they
connect. An example of such a representation of a triangle is
shown in Fig.l.5.
This graph can be easily reduced so that various noise
segments are eliminated and the image is smoothened wherever
possible. A new technique of node coloring is used to avoid
connocta
c o o r d l n a t a
. F i g . i . 5 Graph r e p r e s e n t a t i o n o f a t r i a n g l e .
unnecessary search during recognition and correction. In general
the graph can be viewed as a three colored structure. The 'open'
color indicates that the nodes and arcs are parts of t:he open
loops and the search for a geometric pattern in such regions is
avoided; the 'soft' color indicates the regions of closed loops
where the search for geometric models is carried out; and the
'hard' color indicates the corrected regions and these nodes are
not modified. A graph representation depicting this
classification is shown in Fig.l.6. In this figure nodes S1,
S2, ......, Sm represent line segments of lengths L1, L2, . . . . , Lrn
respectively. The line segments connect their end points
Pl,P2, ..... , Pn through the arc 'connects'. The nodes al,
a2, ....., aL indicate the relative orientation between various
connected line segments. During correction, only the node values
of the graph are modified, keeping the structure intact. This
ensures preservation of various relations existing amony the
features of the input sketch.
1.4.2 Recognition and Correction
A machine is said to recognize a particular pattern, if it
can generate a unique synibol whenever it receives that pattern.
Some of the earlier methods of recognition use template
matching[9]. This kind of matching is not suitable for hand-dram
line sketches, because infinite number of variations exist for
the representation of a single object. To cope with the
recognition of such inputs, the concept of approximate matching
is proposed. Huntsberger et a1.[10] have proposed fuzzy
geometric models to recognize approximate figures.
Pi . P2. . . . .are segment end points S i , S2. . . . . are 1 f nt: segments Li,L2, .... a r e lengths o f line segments a i . a 2 . .... are r e l a t i v e or ienta t ion% o f connected segments
F k g . i . 6 The g r a p h r e p r e s e n t a t i o n o f f e a t u r e s
o f a l i n e s k e t c h w i t h c o l o r i n g .
We have defined geometric models in terms of rules. An input
pattern is said to be recognized, if it satisfies these rules.
The rules contain various adaptive thresholds for approximate
matching. These thresholds maintain consistency in matching,
irrespective of the sizes of sides and/or angles of the input
sketch. The rules are given priorities to resolve the conflicts
that may arise when an input pattern satisfies more than one
geometric model.
Once a pattern is recognized, a check for contextual
consistency of the recognition is carried out. Contextually
inconsistent recognition is revoked and an alternative
interpretation is sought. At this stage, provision for user
interaction is made to resolve the cases where all
interpretations of the pattern fail. Context is represented by a
structure similar to frames[ll] . The default context is
automatically selected whenever the present context does not
provide any information about the pattern queried.
Patterns are corrected in accordance with their contextually
consistent interpretations. We have proposed some simple
geometric transformations for correction. Care is taken to ensure
that that the area covered by the corrected sketch is
approximately equal to that of the input sketch. All irregular
patterns are corrected with reference to the orientation
information of their corrected neighbors. All open line segments
are corrected after correcting closed loops, because any
corrective shift given to an open line segment does not propagate
over to other parts of the sketch.
1.5 WORK ENVIRONMENT
The input equipment consists of a video camera and a
digiti.zer. The input line sketch is illuminated by incandescent
lamps. The data is processed on a sequential processi-ng machine.
A bit.-mapped frame storage CK'I' terminal is used for assessi-ng
inputs and outputs at various stages of drafting.
The preprocessing stage is implemented in C language, as it
involves large array handling and mathematical calculations. But
once the features are extracted, the recognition involves
extensive symbol manipulation, relation matching, and search.
This in~plementation is straightforward and explicit in a
declarative language like PROLOG. PROLOG has the inherent
property of search and backtracking, which can be used for rcle
matching and search for alternate sol.utions.
1.6 INPUT AND OUTPUT SPECIFICATIONS
Input consists of undimensioned line sketches, dcpictiny
various geometric models and their combinations. The sketches may
be hand-drawn flow charts, simple house plans, block diagrams or
engineering design drawings. The hand-drawn sketches have
discrepancies like ragged straight lines, rounded corners, and
extendpd line segments. Various line lengths and angles
represented are only approximate measures. A pair of angles which
are supposed to be equal are only approximately equal. Similarly
a ralr of sides which are supposed to be of equal length are oniy
hpproximately equal. The sketches are fairly well drawn and are
free from overwriting and scratching. The line width variations
do not change the informati-on content. Fig.l.7a gives an example
of the input line sketch.
The output is a perfect geometric representation, which
closely approximates the input line sketch. Various points and
segments of the input sketch are modified, reoriented or shifted,
within a range specified by the thresholds. This necessarily
means that all the rounded corners are sharpened, and that the
extended and spurious line segments are eliminated. F'ig.l.7b
shows a drafted version of the input sketch of Fig.l.7a.
1.7 ORGANIZATION OF THE THESIS
In the presentation of this thesis, we assume the reader t~
be familiar with two dimensional signal processing and logic
programming. We feel that the development of a full fledged
system for automatic drafting cannot be just a cascade of various
modules explained in this thesis. As the objective of this work
is to explore the issues in computer drafting of line sketches
and tr: suggest viable solutions, no implemention effort is made
to proccess large sketches which occupy memory area greater than
the allowable array size in the machine. Fig.l.7 gives an idea
of the size and type of sketches on which various proposed
algorithms were evaluated. We would also like to clarify that
many aspects of the algorithms proposed cannot be explained
mathematically, but an effort is made to justify such aspects
intuitively. Supporting results and illustrations are given
wherever appropriate.
-- - ---
a. INPUT GRAYTONE IMAGE
b. DRAFTED VERSION OF THE INPUT
- - - - -
Fig.l.7 SAMPLE INPUT AND OUTPUT OF THE DRAFTING SYSTEM,
a. INPUT GRAYTONE IMAGE
b. DRAFTED VERSION OF THE INPUT
Fig.l.7 SAMPLE INPUT AND OUTPUT OF THE DRAFTING SYSTEM,
A review of the state of art in the field of computer
processing of line sketches is given in chapter 11. The chapter
concluds with a note on specific contributions of this thesis.
Chapter I11 deals with extraction of line features from the
graytone image of an input line sketch. New algorithms for
binarization and line extraction are proposed. In the
binarization scheme, edge patterns of the lines in a window are
taken as the reference for threshold selection. It has good
noise suppression properties and provides local line vectors at
all edge points. The algorithm for line extraction and
segmentation does not involve computationally expensive process
of thinning. Here, a binary pattern is viewed as a structure of
overlap of runs of object pixels. A trace of overlapping runs is
directly vectorized to get lines and their deviation points. Line
intersection points are extracted through analysis of run length
information.
Recognition and correction processes are discussed in
chapter IV. The recognition process is basically a search for a
pattern in the input which approximately matches with standard
geometric models. The models are defined in the form of rules.
Emphasis is also given to the structural representation of input
sketch features. Correction involves reorientation and
modification of input sketch parameters, in accordance with the
recognized model and the relations existing between the figure to
be corrected and its neighbors.
Chapter V concludes the presentation with a brief summary of
the thesis and a few samples of outputs at various stages of
processing.
CHAPTER I1
COMPUTER PROCESSING OF L I N E SKETCHES: A REVIEW
Towards the improvement of man-machine communication
technology has made tremendous advances in last four decades, but
the goal envisaged still looks as distant as it was in the
beginning. In most of the cases the source and the size of
information necessary for processing a natural input, is quite
obscure. In the absence of a clear view of the problem, many of
the methods proposed towards providing natural inputs to a
machine become useless as soon as the input crosses the
protective barrier ( or constraints) put by their creator. To
overcome this problem, many researchers have tried to study and
emulate the details of biological processes involving the
functions like vision, problem solving, hearing, etc. One of the
vision problems is computer processing of line sketches.
Computers are conventionally used as an aid to draft
various kinds of sketches. Two different approaches are observed
in this kind of machine usage. Section 2.1 gives the details of
these approaches and their drawbacks.
Recent developments direct towards automating the process of
drafting. Some of these systems receive on-line input and some
others off-line. Off-line input devices invariably generate a
gray tone image, irrespective of the type of input . Sections 2.2 and 2.3 describe various existing schemes for binarization of
graytone images and line extraction from binary images,
respectively.
Recognition of extracted features of a line sketch has beec
studied extensively. Some of the early methods use template
matching for recognition of input data. Machine recognition of
natural inputs involves approximate matching. Recent trend is to
develop fuzzy models which match with a set of approximate
patterns rather than a single pattern. A review of these schemes
is given in section 2.4.
Section 2.5 carries a note on specific contributions of the
present thesis in the light of various published methods.
2.1 CONVENTIONAL COMPUTER AIDED DRAFTING
Commercially available computer aided drafting systems still
act as just passive receivers with little or no knowledge of the
task domain. They rely heavily on encoding capacity of a skilled
draftsman[l21. Though these systems surpass the conventional
drafting both in time efficiency and in output quality, the
draftsman's job is still tedious and his communication with the
machine is highly unnatural.
There are mainly two approaches in human encoding of line
drawing into a computer[l2],[13]. The first approach uses an
interactive method in which a graphic work station with
'computer aidcd drafting1(CAD) software package is used
extensively. Were a draftsman's job is complicated, because to
make use of the environment effectively, he has to detect
various regular structures present in a drawing to be drafted and
encode that information into the machine with proper size,
location and orientation information. This mode of thinking is
different from thinking involved in human drafting and it c:ompels
a draftsman to have a panoramic view of the encoded drawing. But
if the resolution of the screen is small or the screen size
itself is small, large drawings may not fit into a single
display. A partial display interferes with the draftman's thought
process. Secondly, editing of the encoded image must also be done
on a chunk of information which was encoded at a time, rather
than individual lines and angles. This may lead to a chain of
corqections and i.s time consuming.
The second approach is closer to manual drafting. Here a
draftsman encodes coordinates of various feature points like
deviation and intersection points of the drawing through a
coordinate sensitive yoke and connects these points as in the
drawing. This necessitates the drawing to be within the limits of
the coordinates of the pointer. Although a draftsman may find
this easily maneuverable, he must endure lengthy and monotonous
task of inputting each intersection and deviation point of every
line of the drawing. The job becomes more tedious if curves are
present in the drawing. Therefore the automation of encoding line
sketch into a computer has been an important issue. Many efforts
have been made to make a machine encode a line sketch by
itself 1131, 1151-[19].
2 . 2 BINARIZATION OF GRAYTONE IMAGES
All off-line input devices like video camera, facsimile,
etc., generate a graytone image, when they digitize an input. To
extract the input information from the graytone image, the object
pixels must be distinguished from the background pixels. This
c.lassification is viewed as a binarization (or binary
thresholding) problem, in case of images of binary pictures.
Binarization is carried out by classifying every pixel in the
image as black (or object) or white (or background), depending on
whether or not the gray value of the pixel under classification
is greater than a suitzably selected threshold.
Thresholding schemes can be broadly classified into
conventional .. .-. -- - - schemes and information - integrating -- - -- schemes[26].
Conventional techniques look for either maximal regions
satisfying some homogeneity criterion or edge information between
.t 1, ,&e regions, whereas the information integrating techniques try
to emulate the biological system approach for processing visual
data. In the information integrating techniques, information from
various related sources are used to put constraints on the data
to be processed, to obtain an unambiguous result.
In conventional schemes a variety of threshold selectiol~
techniques have been proposed. Each technique assumes certain
image model. Most of them perform satisfactorily on the images
which satisfy the assumed model. Thresholding schemes fall into
three main categories.
1.. Global Thresholding
2 . Local thresholding
3. Dynamic Thresholding.
In general, a threshold at any point can be expressed as a
function of the gray level at that point, its neighbors, overall
gray level distribution of the image, and the spatial location of
the point under consideration. If the image is of 'L' levels,
then the threshold at the 'ith' level ( 0 < i < L) is defi-ned as
where fi is the thresholding function for the 'ith' level,
y(x,y) is the gray level of the pixel at (x,y),
N.(x,y) is a set of neighbors of pixel (x,y), I
I is the gray level distribution of the complete image. n
If T.(x,y) is a function of In only, then it is called a global 1
threshold. The global threshold for any level is computed only
once for the entire image. If the threshold Ti(x,y) is a function
of g(x,y) and Ni(x,y), it is called a local threshold. If Ti(x,y)
is a function of the coordinates (x,y) of th: pixel under
consideration, it is called a dynamic threshold. Most of the
recent methods use a combination of these techniques.
A gl~bal technique based on prior information of object area
of the image is proposed by Doyle[21]. It was developed for
binarization of similarity invariant patterns. Here a threshold
is selected such that only a certain number of pixels equivalent
to the known object area, have gray level greater than the
threshold. This method is naturally not applicable, if the object
area is unknown or varies from picture to picture.
For .the purpose of segmenting white blood corpuscles Prewi.tt
and Mendelson[22] chose the threshold at the valleys of the image
histogram. This technique is called the mode method. This
technique involved smoothing of the histogram to remove spurious
modes and valleys of the histogram. The smoothened histogram was
then searched to find out the local maxima (or modes) of the
histogram. Then the threshold was selected at a value between the
two modes. One of the recent methods proposed on these lines is
an automatic binarizing scheme which can be extended for
multilevel thresholding[l]. Here an optimal threshold is selected
by using the discriminant analysis of Fukunaga[23]. The
discriminant measures maximize the separability of the resultarit
classes of pixels. The procedure uses the 0th and the 1st order
cumulative moments of the image histogram. Hence the computation
time is linear function of the size of the image.
In cases of out of focus images, or images with object areas
very s~nall compared to the background area, bimodality of the
image is not distinct. It is difficult to locate histogran?
valleys in such images. To overcome this difficulty, some of the
global methods study local properties of the image[24]-[26].
These methods basically transform the image histogram so that the
valley gets enhanced. In determining how each point of the image
should contribute to the transformed histograms, the rate of
change of gray levels around that point as well as the gray level
at that point, are considered. Normally changes in gray levels
occur at the edges comrnon to both the object and the background.
So the rate of change of gray level is also termed as 'edge
value'. Edge values are found out through edge operators like
Laplacian, Robert Cross, DIFl(maximum difference of average gray
level in pairs of horizontally and vertically adjacent 2-by-2
neighborhoods), etc.[4]. The points at the interior of the object
or background generally have low edge values because of
uniformity of the surroundings, while those on the
object/background boundaries have high edge values. Thus if a
histogram is obtained for only low edge value pixels, histogram
peaks remain the same while the valley becomes prominent[25]. On
the other hand, if a histogram of pixels having only high edge
values is obtained, the histogram should have a single peak at
the valley point of the image histogram. Alternatively, a
weighted histogram is obtained by counting the higher edge values
more heavily[27],[28].
Weszka et.a1.[26] used a pure Laplacian operator to define
the edge value. Since these are second derivative operators, they
have a zero value on a linear ramp formed at the
objectjbackground boundaries, but high edge value or1 the
shoulders on either side of the ramp. Thus the points having high
Laplacian values will be adjacent to, but not on, the boundaries.
The histogram of high edge value pixels should now have two peaks
representinq the shoulders of the boundary and a deep valley
representing the boundary.
All the above described global methods assume that the
abject snd thc background gray levels are uniform in their
respective regions. This is not true in practice, especially in
the images of hand-written characters or hand-drawn line
sketclles, because of uneven stress put by hand. Global
threshoiding of such iinages may lead to generation of
disconnected binary patterns. Moreover, all the global methods
are sensitive to noise, because irrespective of its surroundings,
a pixel is classified purely on the basis of its gray value. To
handle this situation, local thresholding schemes were proposed.
Here a pixel is classified by comparing its gray value with the
gray value obtained from a set of neighbors. The spatial
distribution of this set is called a window. The size and shape
of the window varies from method to method. Some methods assume
minimum width of the object pattern and some are selected by
experimental studies. A review of such schemes is given by
Weszka[24] and Ullmann[2]. Wo15[29] assumed that in textual
images, width of a limb does not exceed a known limit. He
proposed that a pixel is an object point, if it has a gray value
higher than the gray values of a certain pair of neighboring
pixels, by more than a specified constant. Ogawa et a1.[30], in
their OCR system proposed an algorithm, where a set of neighbors
are chosen depending on the a priori line width information. A
pixel is classified as an object point, if it is darker than the
average gray value of the neighbors, by more than an
experimentally selected constant. On the other hand, Ullmann[64]
selected a neighborhood pattern without assuming any width
information.
The loss of information due to digitization of a picture is
aggravated by binarization. This leads to the formation of tailed
and jagged patterns during thinning of the binarized pattern. To
reduce this degradation, a method of double adaptive thresholding
is proposed in[3]. One more local method which has more of global
flavor is proposed by Ridler and Calvard[6], where an iterative
technique is used for the selection of a threshold. Wojcik[8],
proposed a new thresholding scheme, where the existing gray value
of a pixel is replaced by the maximum homogeneous gray value,
which is found by a rotating window technique. The method tends
to blunt corners and sharp curvatures. The performance of the
method can be improved by giving weightage to the pixels of the
window, depending on a proximity measure.
In dynamic thresholding schemes, the threshold at a point
depends on coordinates of the point, in addition to its own gray
value and the gray values of a set of neighbors. Chow and
Kaneko[32] use a scheme of dynamic thresholding where the value
of the threshold at a point depends on its proximity to boundary
points. The threshold for these boundary points is determined by
local histogram analysis.
A recent trend in image classification is to use information
integration, which is how the biological vision is assumed to be
working. One of the approaches in that direction is proposed by
Ahi~ja et al.[3]. Here features of neighborhood gray level
patterns are used for pixel classification. The feature vectors
are obtained from a known set of neighbors. In addition to the
image information, external information like context and texture
information are also used for image segmentation[34]-[36]. These
methods have proved to be useful in case of interpretation of
complex images like satellite imagery, 3-D imagery, etc.
2.3 FEATURE EXTRACTION FROM BINARY LINE PATTERNS
It has been observed that in almost all practical cases of
natural communication with machines, the input is either too
large to match directly with a stored pattern or too many pattern
variations exist in the representation of a single object. Hence
various methods have been proposed to compress the input data or
to oxtract oomo invariant foaturos, which uniquoly charactorizo
the input. This in general is called feature extraction. In case
of line images line segments, their relative orientations,
connectivity and intersections form a set of features which
uniquely characterize a line sketch. Several methods have been
proposed to extract these features from the binary image of a
line sketch.
We find two distinct trends in the algorithms for line
extraction from binary patterns. The first relies on obtaining a
singly connected pixel pattern of the object by thinning and then
extracting various line segments. The second deals with
extraction of lines without thinning. Methods which extract lines
from binary patterns without thinning are further classified into
two categories. One category extracts lines by direct
vectorization and the other by core line tracing. These methods
invariably assume a group of object pixels, rather than a single
pixel as the basic unit constituting the object.
Most of the published work in line extraction, proposes
thinning as an inseparable process. The algorithm proposed by
Hilditch[37] uses an iterative edge erosion technique. Here a
3 X 3 window is traversed over the image and a set of rules are
applied to the contents of the window. The rules specify the
pixels to be marked for deletion at the end of each iteration.
The iterative scan completes when no more points can be deleted
or marked. The rules can be summarized as , delete an object
point if 1) it is an edge point; 2) it is not an end point; 3)
its neighborhood pattern does not match with any of the seven
predefined window patterns. These window patterns basically
include various tests for break points and successive erosions.
Naccachc and Shinghal[l8] gave a formal description of this
algorithm. In the same paper the authors have proposed a new
thinning algorithm called the 'safe point thinning algorithm'
(SPTA). Here the method is similar to the one proposed in [ 3 7 ] ,
but the rules are optimized into a set of Boolean expressions,
which can be evaluated using the neighbors of each point. The
algorithm is twtce as fast as similar methods but it generates
ragged lines at 'T' junctions. Many other thinning algorithms are
proposed on similar lines[39],[40]. Udupa and Murthy[30]
suggested a method to obtain a piece-wise-linear approximation of
the skeleton of a unthinned binary image. It is an iterative
algorithm, where a set of window operators is passed over the
image to detect the 'turning points' and 'end points' of the
lines. But the algorithm is not very robust as it assumes that
each line has a constant width over at least six pixels along the
line.
Most of the above algorithms assume an environment of lov~
resolution data. A linear increase in resolution introduces
quadratic increase in data to be processed. Since every iteration
is followed by stripping of the marked pixels, the computation
time increases in cubic proportion to the resolution. This
drawback of the iterative techniques makes them unsuitable for
practical situations such as real time processing and high
resolution image processing. Added to this, line information is
obtained only after additional processing of the thinned
patterns.
This drawback is overcome by direct vectorization of binary
patterns. Here, instead of every pixel being treated
independently, a group of pixels is taken as the basic unit of a
binary pattern.
Sebock et al.[7] assumed a run of object pixels as the unit
of line, and the manner in which the runs overlap determines the
points of intersection of lines. To make the algorithm
independent of width, run length variation is ignored. Hut
because of this. the algorithm fails to recognize the horizontal
'T' intersection. Ramachandran[6] proposed a method to encode
engineering drawing. The method gives importance to exact
reproduction of the input image. Here all edges of an input image
are marked, and a set of constant length vectors are placed in
between the edges of a line. The average width between the edges
along the horizontal scan gives the line width information. The
direction of the vector and the line width information is then
encoded in a compressed form. As the method uses only vertical
trace, it is inefficient in encoding nearly horizontal lines and
code developed for such lines is large.
Some ]nethods use core-line tracing for line extraction. In
the method proposed by Wakayama[42], a maximal square window of
object pixels is taken as the basic unit of line and the central
pixel of the window is taken as the pixel on the core line. In
addition to obtaining the thinned version of the binary pattern,
this method also provides a capability of exact reproduction of
the binary pattern, when needed. In the method proposed by Arvind
et a1.[3], the binary image is intially blurred using a Gaussian
filter to generate peaks of gray values at the core of a line
pattern. Then an adaptive thresholding scheme is used to detect
these peaks from rest of the data. But this algorithm suffers
from the width dependency. If the width of the Gaussian filter is
too large, the lines with small widths get blurred beyond
recognition and if the filter is adjusted for thin lines, wider
lines may get discarded as regions.
To extract line features, thinning and core line tracing is
followed by vectorization. Thinned patterns are viewed as chain
connected curves. Deviation points in these curves are found out
by calculating the angles subtended at each point by a pair of
fixed length line vectors, and selecting the points where local
maxima of angles occur[43]-[46]. The point at which a pixel has
more than two neighbors is an intersection point.
In direct vectorization schemes, though finding the
deviation points reduces to the same procedure as in the chain
coded curves, finding intersection points becomes difficult.
Sebock et a1.[7] and Pavlidis[47] proposed two different methods
for i.rltersection detection. In the former method, a linked list
of runs of pixels is formed with a known number of in-pointers
and out-pointers at each run. Wherever a pointer conflict occurs,
an intersection point is marked. In [47], run length coding is
used for grouping the object pixels. These groups are called
nodes. The nodes are traced in a predefined manner. Whenever a
node traces back to the already traced node, an intersection
point is marked. Filtering is suggested to avoid detection of
innumerable number of intersection points, due to stray holes in
the object area or in case of processing of a checkered board
pattern.
2.4 RECOGNITION OF LINE PATTERNS
Various schemes have been proposed to represent object
features. To recognize an object, the representation chosen must
make the relations existing among the features explicit. One way
of encoding these relations is proposed by Wojcik[8], which is a
graph representation of the line sketch. Here the arcs represent
the relations and the nodes represent the extracted features.
Pattern matching is extensively used for recognition. If the test
pattern matches with a pattern which the machine has already
associated with a name tag, the test pattern is assumed to be
recognized as an object with the associated name. Several
matching techniques have been proposed. Some methods involve
exact matching and some involve approximate or fuzzy matching.
In exact matching techniques, various sets of object models
are stored. Every stored element is called a template and it is
attached to the name of the object it represents. Only if a test
pattern matches with any one of the templates, the pattern is
said to be recognized as an object corresponding to the template
it matched. The field of character recognition makes extensive
use of matching techniques[48],[49]. A simple matching strategy
is to check for one-to-one correspondence between the line
segments of the figure to be recognized and those of the known
template. Some on-line recognition systems make use of the
temporal order of strokes also[l8]. If the image is chain coded,
pattern of chain is directly matched with a template, as in
string matching. These methods become impractical if the number
of object patterns is very large, as in the case of patterns of
hand drawn sketches. For such inputs fuzzy or approximate
matching is inevitable.
In approximate matching, object models are devised such that
a set of approximate object patterns match with each model,
unlike a unique pattern matching with a model as in the case of
template matching. In the methods proposed by Huntsberger et
a1.[23], geometric models are defined in the form of fuzzy sets.
If an input pattern is a member of any one of these sets, the
pattern is recognized as the geometric model associated with the
host fuzzy set. If the pattern is a member of more than one fuzzy
set, some definite priority is given to the models to resolve the
conflict.
2.5 CONTRIBUTIONS OF THE PRESENT WORK
A new local thresholding scheme i.s proposed[54]. A 3 X 3
window is used for pixel classification. The method highlights a
new idea of noise suppression. Constants of the threshold
function can be found either experimentally or automatically
using the discriminant analysis described in Otsu[l]. The method
also extracts local line vectors at all edge points. These line
vectors can be used for curve detection.
The line extraction algorithm proposed is similar to the one
proposed by Pavlidis[47], but it uses the prior line width
information to infer the existence of an intersection point.
Lines are viewed as overlapping run structures as proposed in
Sebock et a1[7]. To provide more uniform treatment to runs of a
line, raster and non-raster scans are used for the extraction of
vertical class of lines and horizontal class of lines
respectively. This is an improvement over the engineering drawing
encoding scheme proposed by Ramachandran[G], where the lines of
horizontal class are extracted as an array of vertical line
vectors.
The proposed representation scheme is an improvement over
the graph representation of Wojcik[8]. Various reduction and
classification techniques are proposed. This representation
serves as a relational database, which answers queries pertaining
to various geometric models and gets updated during correction.
Tor the recognition of approximate geometric figures from an
input sketch, fuzzy geometric models are defined in the form of
rules. These rules have dynamic thresholds which allow
approximate matching[55]. These geometric models are more
flexible than the ones proposed in Huntsberger at al.[10], where
the threshold of approximation is not explicit.
Techniques for correction of undimensioned line sketches are
developed. This correction is in accordance with the relations
governing the matched geometric model. A corrected sketch is an
aesthetically improved version of the input. Correction can also
be controlled by user specified context through a context module.
CHAPTER I11
MACHINE REPRESENTATION OF LINE SKETCHES
Even though a line sketch is binary, its digitized image
obtained through a video camera interface is graytone. In such an
image, the sketch information is obscure because the object
pixels are not distinct from the background pixels. Hence the
image is binarized to separate the object pattern from the
background. This object pattern is then processed to extract
characteristic features of the sketch. The output of this process
is in the form of coordinates of feature points and their
connectivity information. This output structure forms a
convenient machine representation for recognition and correction
of the input sketch.
A brief note on the input and output specifications of the
preprocessing stage is given in section 3.1. In section 3.2, a
new binarization scheme to separate object pixels from the
background is proposed. A scheme to extract line features from a
binary image is discussed in section 3.3.
3.1 INPUT AND OUTPUT SPECIFICATIONS OF THE PREPROCESSING STAGE
The sketch to be processed is drawn on a plane sheet of
paper ( Fi.g. 1.3a). The sketch contains an isolated vertical
straight line segment, which is drawn at the top left corner of
the sketch, with the same i~lstrulnent with which the sketch is
drawn. This line segment acts as an indicator line, which
provides the line width information. The input to the system is a
digitized version of such a sketch, which is obtained through a
video camera and an analog to digital converter. The digitized
version is a graytone image with gray levels of the pixel varying
from 0 (indicating the brightest region) to 255 (indicating the
darkest region). Object pixels in the image are darker than the
background pixels. In a graytone image, sharp variations in gray
levels of the input sketch are eliminated, i.e. object/background
boundaries are smeared. In addition to this gray level
modification, noise may also introduce random variations in the
pixel gray values.
The output of the preprocessing stage is a set of features
which uniquely characterizes the input sketch. We observe that a
set containing all line segments, deviation points and
intersection points, uniquely characterizes a line sketch. Hence
the output specifies coordinates of various feature points and
their connectivity information.
3.2 BINARIZATION OF GRAYTONE IMAGES
Graytone images contain smeared boundaries and gray level
variations due to noise. So a graytone image, when binarized, may
generate an object pattern which is either enlarged or eroded.
Further, the pixels which are modified by noise may form dark
specks (spurious object points formed on the background) or holes
(spurious background points formed on the object). Ideally, a
binarization scheme should eliminate the following
discrepancies:
i. noise specks should be not be formed on the background,
ii. no holes should be formed on the object, and
iii. smearing of the boundaries must be eliminated.
The proposed binarization scheme estimates a local threshold
at each pixel of the image and classifies the pixel as an object
pixel if it has a gray value higher than the estimated threshold.
The scheme also provides local line vectors at all edge points of
the object.
3.2.1 Proposed Binasization Scheme
A local threshold is estimated based on the image features
around the pixel to be thresholded. The equation governing a
local threshold is represented as follows:
where T(x,y) .is threshold for the pixel at the point (x,y),
N(x,y) is a set of neighbors of pixel at (x,y), and
( ) is a threshold function
If g(x,y) is the gray value of the pixel p(x,y), then
binarization is defined by the following equation.
if g(x,u) > T(x,y)
Then p(x,y) is an object point.
Else p(x,y) is a background point. (3.2)
For the purpose of estimation of the threshold, a 3 X 3
window centered around the pixel to be thresholded is considered.
I n other words, the set
represents the pixels in the window around the pixel p(x,y).
In s window, 8 sets of 5-pixel neighbors are defined as
shown by the shaded area in Fig.l.4. This set of 5-pixel-
neiqhbors represent line edge patterns in a 3 X 3 window.
If 'm' is the minimum of the average gray values of the 5-
pixel-neighbor set corresponding to the pixel p(x,y), the
proposed threshold function is given by
where K1 and K2 are positive nonzero constants.
The values of the constants K1 and K2 depend on the contrast
of the input grsytone image and the overall picture brightness.
These can be f o ~ x l d experimentally. Automatic determination 01 K1
and K 2 can also be done using discriminant analysis of the gray
level dFstri.but.ion of the image[23]. If K is the point of optimal
classification[l], then assuming that image is uniform in a
region of 3 X 3 pixel window, we have both 'm' and T(x,y) of
equation (3.3) equal to K . On substituting this condition in the
equation we have
Typically the value of K2 is 1.
Once the threshold is calculated at a pixel, the pixel is
classified either as a background pixel or as an object pixel, in
accordance with the equation (3.2). For convenience, the gray
values of all the pixels which are classified as object are set
to '1' and those of background pixels are set to ' 0 ' . The image
thus obtained is called a binary image.
3.2.2 Analysis of the Proposed Binarization Scheme
h proposed method is based on the assumption that except
dt t h e edges, the picture is uniform in an area of 3 X 3 window
( t .~n i formi ty asstrn~ption). This r-iecessarily means that a hand-drawn
line is a k Least 3 pixels wide[l2,16] . The edge patterns defined
by t h ~ 5-pixel-neighbors exclude a completely dark window and the
p*~tt;esns formed by corners of a sketch. This exclusion is
jus'tif-'j.od t~ecnuse of the following reasons:
Rccause the image distribution is not known a priori, all
. . w. - x i in t5s image are to be treated uniforn~ly. So, even
though corners form a small percentage of the object points,
C Q ,L- . -. 3 -- pattern checking should be done at all points. This
;n i iPr , ; .::lze process computationally expensive.
2 , Rocc:.;so of the involvemer?.t of a smaller set of neighboring
p i . z r a 1 ; i : a d 2 - f armining the threshold, the noise sensitivity of
s . *- , l i s i he-cshol ci increases.
- i n . : : t h e mi.nimum line width is of 3 pixels, the loss in
i,i?r.li<:?: . -, information that rnay occur due to blunting of corners C
', T'" - ., ;- ,; >\!LL,.: :i 1 . ._i . 4 . 1, :l. Y.; ;I -L :- -L~: , Sc?c:ause of the uniformity assumption, the
I.r,."c.si:?a~tion -f:hat can be obtained from a completely dark
window can be obtained from any one of the 5-pixel-neighbors.
As the background has a gray value lower than that of object,
the 5-pixel-neighbor pattern corresponding to the minimum-average
'm' in equation (3.2) gives the most likely background pattern
along an edge of a line. The pattern also gives the direction of
the edge of the line at the point under consideration. Hence the
5-pixel-neighbor pattern corresponding to the minimum-average is
called an Edge-Vector and the minimum-average is called the edge-
vector-gray-value. Only if the pixel gray valug higher than the
edge-vector-gray-value by a certain extent, it can be treated as
an object pixel.
Dark specks on the background are formed when the noSse
level is high enough to make a few random pixels ill the
background area as dark as the object. Similarly holes are formed
in the object when the noise level is high enough to make a few
random pixels in the object area as bright as the background.
Formation of specks or holes is not easily controlled by either
global or conventional local thresholding schemes, unless they
are preceded by some filtering or local averaging processes. But
these preprocesses with the exception of median filtering tend to
increase smearing and some time may introduce disconnection in
object patterns.
A thresholding scheme to suppress specks and holes must
generate large thresholds on the background, so that the
formation of specks is suppressed. It must also generate low
thresholds on the object, so that the holes are filled up. It can
be observed that, if the window is on the object the edge-vector-
gray-value is high, because it gives the average cf the object
pixels. Similarly on the background region, the edge-vector-gray-
value is low. Using the edge-vector-gray-value 'm', the threshoid
could be selected as
where C1 and C2 are chosen constants.
In both equations (3.5) and (3.6), it can be observed that
the threshold is low on the background because 'm' is low, and
thereby assisting the speck formation. The threshold is high on
the object because 'm' is high, and thereby assisting the hole
formation. Most of the conventional schemes suffer from this
drawback. But in equation (3.3), the threshold is high on the
background and is low on the object area. This property
suppresses the formation of specks and holes. In Fig.3.1,
binarized images obtained from a global threshold(Otsu[l]~) and
from the proposed scheme are compared. Fig.3.la is the input
graytone image which is sprinkled by random additive noise of
amplitude 30% of the average gray value of the image. Fig.3.lb is
the binarized version obtained by Otsu[l] algorithm. Fig.3.l~ and
Fig.3.ld show the outputs of the proposed algorithm with and
without automatically selected constants. From the figure it can
be observed that the proposed algorithm has markedly better
performance in the presence of noise.
a . NOISY GRAYTONE IMAGE
--
G. BINARY IMAGE ODTAINED BY THE PROPOSED ALCO- RITHM WITlI AUTOMATI- CALLY ESTIMATED CONSTANTS.
Fig. 3.1 ILLUSTRATION OF THE PROPOSED ALGORITHM.
b. BINARY IMAGE OBTAINED BY OTSU[l] ALGORITHM.
.- --
d. BINARY IMAGE OBTAINED BY THE PROPOSED ALGO- RITHM WITH EXPERIMENT- ALLY SELECTED * CONSTANTS.
r.; ..A-
NOISE SUPPRESSION PROPERTY OF THE
It i.s observed tha.t the threshold is high if the edge-
vector-gray-value is low. Hence, the pixels in the smeared region
experience a higher threshold than the pixels in the object:
region. Thus the misclassification of the pixels of smeared
boundary as the object pixels is suppressed.
The scheme is sensitive to noise present in the smeared
region. This is because the assumed uniformity condition is not
applicable on the smeared region, where the gray values transit
from the object level to the backgrourld level. But fortuna-tcly
the specks or holes formed at the boundary of a line do not
distort the line information significantly.
To extract local line vectors, the 5-pixel-neighborhood
patterns are given weiyhtages depending on the direction of the
edge they represent, as shown in Fig.3.2. If the lines in an
input image are of uniform thickness, the direction of the edge
of a Line at any point is the same as the direction of the line
at that point. Fig.3.3 shows a birlarized image with its edge
vectors specified. The numbers in the image are the 'edge-vector'
weightages. It can be seen that vertical lines in .the image
predominantly have the number '3' on their edges, indicating
that the lines are inclined at 90° to the horizontal axis.
Similarly, the horizontal lines have predominantly the nulnber '1'
on their edges, indicating that they are horizontal. It can also
be. seen that, inclined lines have a combination of numbers on
their edges. For example, the positively inclined lines have a
combination of 3s and 2s on their edges, indicating that they are
inclined to the X-axis by an angle between 90°(represented by
the weightage 3) and 4s0(represented by the weightage 2).
V s e t
3
set I- set
s e t
H, V , I+, I-. sets r e f e r to the e d g e
p a t t e r n s o f F i g . 1 . 4 .
Fig.3.2 Weightages o f t h e Edge v e c t o r s .
44
Fig.3.3 A BINARIZED IMAGE WITH LOCAL LINE VECTORS
3.3 EXTRACTION OF LINES FROM A BINARY IMAGE
Lines are extracted from a binary image of a line sketch
without thinning. Towards the development of the algorithm, an
image model is assumed. The algorithm proposed is basically a
direct vectorization scheme. The lines extracted are output in
the form of coordinates of their end points. The output also
specifies their connectivity.
3.3.1 Assumed Image Model for the Line Feature Extraction from a
Binary Image
A binary image is viewed as a matrix where the object pixels
are represented represented by 1's and the background pixels by
0's. Usually all images have an indicator line as specified in
section 3.1. All the lines in the image are of uniform
thickness. These assumptions constitute the model of the image to
be processed.
A contiguous sequence of l's(object pixels) in any row of
the image is called a run. The number of pixels in a run gives
the run length. The rnanner in which runs overlap determines the
line features like deviations and of lines.
A run in the (i+l)th row of the image matrix is said to
overlap a run in the ith row, only if
B = E l ) + and B(i+l) = < E(i)+l
where B ( i ) is the beginning column of a run in the ith row and
E(i) is the ending of the run in the ith row. Similarly B(i+J-)
and E(i-i-1) are the beginning and ending of a run in the (i+l)-th
row. In other words, a run on the ith row has a overlapping run
on the ( j.+l jth row if a-k least one pixel. in the run of ( i+l )th
row is i.n the 8-nei.ghborhood of a pixel in the run of ith row.
A s long as an overlap exis.ts, the line continues. When a
line is vertical or nearly vertical, we observe that a line
obtained by the trace of the central pixels of the runs is the
line represented by the corresponding binary pattern as shown in
Fig.3.4a. But the line represented by the central pixels of the
run ceases to be the line represented by the binary pattern, if
the line is horizontal or nearly horizontal, as shown in
Fig.3.4b. Under such conditions, if the runs were viewed along
the columns we could have extracted the correct line by jo.ining
the central pixe1.s of the runs. This necessitates classification
of the binary patterns into vertical and horizontal classes
before the lines are extracted. The following discussion shows
that the 'run length' and the 'line width' illformation can be
used to carry out this classification.
When a line is along the column of the image matrix, the
runs traced by raster scan(row-by-row) are perpendicular to -the
line they represent. Hence the run length can be taken as the
measure of the width of the line at that location. If a line is
not exactly vert;ical, the run length observed in the raster scan
is greater than the actual width of the line as can be seen in
fig 3.5. We select a threshold of inclination of 6o0 with respect
to the vertical axis, above which the lines are considered as
horizorital class of lines. This classification can be carried out
by the run length and the line-width information. From Fig.3.5,
we observe that
F i g . 3 . 4 a . Vertical t r a c e of a vertical line pattern.
b . Vertical trace of a horizontal l i n e pattern.
w width o f the l i n e .
- . w / cost3 = run length o f the -- -- incl ined 1 ine . ---A --- w + w / cose = the run length a t
tha intersect ion.
- - : 8 = angle o f i nc l ina t ion .
Flg.J.5 Run- length variations with line
inclination and at line intersection.
(run length) * cos( O ) = w i d t h .
Hence for 6- 60°, the run length = 2* width.
Therefore, if the run Length observed from the vertical
direction is greater than twice the line width, the corresponding
binary pattern is classified into the horizontal class. The ruR
length and the line width information can be further used for
intersection detection, because the run lengths observed at all
intersection points are greater than twice the line width,
irrespective of the angle of inclination of the intersec-ti.lg
lines as shown in Fig.3.5.
3.3.2 Proposed Feature Extraction Scheme
We assume that the line traversing the core of the binary
pattern is a good approximation of the input line. We have
already discussed in the previous section that the central pixel
of the run is on the object line, only if the runs are nearly
perpendicular to the direction of lines. But in raster scan,
horiz~ntal lines have the runs parallel to the direction of the
l i ~ l e s and a trace of central pixels of the runs represents a line
other than the object line. Hence non-raster scan(co1umn-by-
column) is used to trace horizontal class of lines. In either
scan, only one class of 1.ines are processed. If lines of the
other class are observed, corresponding runs are skipped.
Initially, the runs of the indicator line are traced through
raster scan. Because the indicator line is a vertical line
segment, the average line width information of the sketch is
obtained by calculating the average run length of this indicator
line. O n c c -t.:hc! l i.ne wicl.t:h in fo . rmat . ion :i s extracted, t h e i nrl L c a l : ~ ) ~ '
line is not processed.
In raster scan, all runs with run length less than twice -the
average lj-ne width are traced. At every stage of tracing, ofily a
constant(srnoothing factor) number of overlapping runs are traced
and a vector is placed such that it joirms the central pixel of
the starting and ending runs of that stage. If this vector
deviates from .the vector representing the line extracted so fa r .
by more than a prescribed threshold, a deviation is i11di.cated
within the span of this new vector. Then the exac-t point of
deviati.on is found out by a sirnple binary search. Once the exact
point of deviation is found, the line extracted so far up to the
point of deviation is stored as a line segment and the rest of
the new vector is taken as a part of a new line., Then the next
stage of line tracing begins. On the other hand, if the deviation
of the new vector is less than or equal to the prescr~ibed
threshold, the vector is considered as a continuation of the
line.
This process of line tracing is interrupted either by t h e
absen.ce of an overlapping run or by the presence of an
overlapping run of length greater than twice the line width.
Whenever an interruption occurs, the line traced until that point
is stored. If the interruption is due to the absence of an
overlapping run, a search for extraction of lines from untraced
patterns begins. But if the interruption is due to the run length
constraint, the tracing continues but the line trace is skipped
as long as the run length is greater than twice the line width.
The value of the '"smoothing factor' is heuristically
selected as thrice the li.ne width. This constarrt should not be
too large because it may suppress the detection of prominent
deviation points also, nor can it be too small for it may then
lead to the detection of a large number of spurious deviation
points.
L.ine extraction through non-raster scan is carried o u t on
similar lines. But here all t;he vertical lines are skipped u s i q
the same run length and line width constraint. In either of the
scans, the patterns f:rorn which lines are already extracted are
skipped. This avoids duplication of lines in the range of
inclination of 30° to 60O with respect to the coordirlate axes,
where horizontal and vertical classes of lines overlap.
T h e L i n e Extracti.~~~ Algorithm
1. IF there is an unprocessed run, start the line trace.
ELSE
stop.
2 . WHILE there is an overlap and the run is shorter than twice
the line width and the number of runs stored is less than the
smoothing factor,
Trace the runs as the constituents of the line extracted
so far.
3 . WHILE there is overlap and the run is shoxter than twice the
line width and number of runs stored is less than the
smoothing factor,
Trace the runs as the constituents of the new line
vector.
4 . I F t h e d e v i a t i o n between t h e new l i n e vecztor and the :me
e x t r a c t e d s o f a r i s g r e a t e r t han t h e t h r e s h o l d ,
f i n d t h e e x a c t d e v i a t i o n p o i n t , o u t p u t t h e l i n e segment
up t o t h e d e v i a t i o n p o i n t , s t o r e t h e remaining n e w line
a s t h e l i n e e x t r a c t e d s o f a r .
I F t h e r e i s an o v e r l a p , and t h e run l e n g t h i s less than
twice t h e l i n e width ,
go t o s t e p 3,
ELSE
o u t p u t t h e l i n e e x t r a c t e d s o f a r and
I F run l e n g t h i s g r e a t e r t han tw ice t h c l i i - 1 2 w i d t h ,
s k i p t h e l onge r runs and go t o s t e p 2 ,
E l s e
go t o s t e p I .
5. I F t h e d e v i a t i o n between new l i n e v e c t o r and t h e l i n e
e x t r a c t e d so f a r i s l e s s t h a n o r equa l t o t h e t .hseshold,
add new l i n e v e c t o r t o t h e l i n e e x t r a c t e d s o f a r .
I F t h e r e is o v e r l a p and t h e run l e n g t h is l e s s t h a n
tw ice t h e l i n e wid th ,
go t o s t e p 3.
ELSE
o u t p u t t h e l i n e e x t r a c t e d s o f a r and
I F t h e r u n l e n g t h is g r e a t e r t h a n tw ice t h e l i n e
width ,
s k i p t h e l onge r r u n s and go t o s t e p 2 ,
ELSE
go t o s t e p 1.
After the extraction of lines, the end points of tho line
segments separated by a distance less than twice the line width
are given common coordinate points. This gap in between two
segment end points occurs because rows and columns are used for
coordinate specification. This gap filling is based on the
assumption that for any two lines to remain separated on the
paper, they should have a minimum of 'one line width' gap between
them.
3.3.3 Results of the Line Extraction Process
An efficient method for vectorization of binary line
patterns is proposed. The vectorization is strictly based on the
number of curvatures and corners present in the line. Hence
vectorization of horizontal or nearly horizontal lines is
possible with equal ease as for the vertical class of lines. An
example of the input and output of the preprocessing stage is
shown in Fig.3.6. Fig.3.6b is a plot of the features extracted in
the preprocessing stage. The features are represented in the form
of coordinates of the segment end points and their connectivity,
as shown in Fig.3.7.
The segment information along with the line width
information makes the various features of the image explicit to
recognition and correction processes discussed in chapter IV. The
machine representation of Fig.3.7 can also be viewed as a
compressed code of the input line sketch.
- -- - ---
a. INPUT GRAYTONE IMAGE
b. PLOT OF THE FEATURES . EXTRACTED IN THE PREPRO- CESSING STAGE.
Fig.3.6 SAMPLE INPUT AND OUTPUT OF THE PREPROCESSING STAGE.. I
5 3
coord ina te I p o i n t f,
coord ina te (pa i n t2 ,
caord ina te 0point3,
coord ina te (point4,
coord ina te b o i n t 5 ,
coord ina te 6point6,
caard ina te (point7,
coord ina te (point8,
connects ( po in t i. poin t21 . connects (point2. po in t31 . connects (point4, po in t51 . connects Ipoint5. po in t61 . connects lpoint7. po in t51 . connects [point4. po in t21 . connects IpointZ, po in t81 .
connects Ipoint5, po in t31 .
connects (point6, po in t31 . connects (point3. p o i n t e l . connects (paint7, po in t41 . connects Ipoint4, p o i n t i l .
Fig.3.7 M a c h i n e r e p r e s e n t a t i o n f o r t h e
sketch, F i g . 3.6%.
CHAPTER IV
RECOGNITION AND CORRECTION OF LINE SKETCHES
The sketch features extracted in the preprocessing stage are
recognized and corrected to obtain a drafted version of tile input
line sketch. To facilitate the search that is involved I-luring
recognition, the input is represented in the form of a gra,~h. As
the features input are from hand-drawn line sketches, only
approximate matching is possible in the recognition stage. The
recognized figures are then corrected in accordance with the
geometric model with which they match. If the input sketch is a
combination of geometric figures, while correcting an individual
figure, its relation with the connected neighbors is preserved.
The context in which a sketch is to be viewed plays an important
role in the rec~gnition process. Hence provision to specify thc
context is also made available.
Section 4.1 describes a graph representation of features of
a Line sketch. Section 4.2 describes organization of standard
geometric model-s in the form of rules. Section 4.3 explains the
organization of the context module. A control strategy to
coordinate the processes like rule selection, pattern search,
check for contextual consistency, correction, etc., is described
in section 4.5. Implementation details of the recognition and
correction stages are given in section 4.6. In section 4.7
performance of the recognition and correction stages is
illustrated with typical examples.
4.1 GRAPH REPRESENTATION OF FEATURES OF A LINE SKETCH
The features extracted from an input line sketch are
represented in the form of a graph structure. The recognition and
correction processes view this representation as a relational
da-tabase, which provides the input sketch information. The
recognition process carries out search in the database to
recognize the geometric figures that are present, whereas the
correction process updates the database whenever necessary. The
final updated version of the graph represents the output of the
recognition and correction stage.
4.1.1 Need for Representation of Extracted Features
Suitable representation of the features of an input sketch
is needed to meet the following basic requirements:
1. Various features like line width, line segments, deviation
points, etc., make no more sense than a set of numbers to a
machine. If this set is to be interpreted the relations
existing among the elements of the set must be made explicit.
2. An input line sketch may be a combination of many geometric
figures. Therefore recognition of a pattern approximating a
geometric model invariably involves a search. This search
must be efficient.
3. The correction process, while correcting individual geometric
figures of a sketch, may modify the overall sketch
information. This modification is not acceptable, because the
aim of d r a f t i n g i s t o p r e s e n t t h e same s k e t c h in fo rma t ion i n
an a e s t h e t i c a l l y improved manner. Hence i t is necessary t o
p r e s e r v e t h e i n p u t i n fo rma t ion .
4.1.2 Suitability of Graph Representation
A graph is a connected s t r u c t u r e , where every node i s
connected t o a t l e a s t one a r c and every a r c connec ts two nodes. A
node i n t h e graph r e p r e s e n t s an e x t r a c t e d f e a t u r e , whereas an a rc
r e p r e s e n t s t h e r e l a t i o n t h a t e x i s t s between t h e connected nodes.
A r e l a t i o n t h a t e x i s t s between a p a i r of f e a t u r e s can be made
e x p l i c i t by connec t ing t h e p a i r of f e a t u r e nodes w i th an a r c
r e p r e s e n t i n g t h e r e l a t i o n . F i g . l . 5 shows one such graph
s t r u c t u r e , which r e p r e s e n t s a t r i a n g l e . The graph s t r u c t u r e can
be viewed a s a r e l a t i o n a l da t abase which answers t h e q u e r i e s on
t h e r e l a t i o n s e x i s t i n g among t h e r e p r e s e n t e d f e a t u r e s .
Recogni t ion i n v o l v e s a s e a r c h i n t h e da t abase . This s e a r c h i s
f a c i l i t a t e d by a graph because once t h e s e a r c h f o r a p a r t i c u l a r
p a t t e r n beg ins , t h e nex t node t o be searched must be an ad jo in ing
node. Th i s a d j o i n i n g node can be r e a d i l y ob ta ined from t h e graph.
F u r t h e r , t h e graph r e p r e s e n t a t i o n f a c i l i t a t e s r e d u c t i o n i n search
t ime through t echn iques l i k e node c o l o r i n g , smoothing o f l i n e
d e v i a t i o n s , etc:. I n a geomet r ic l i n e s k e t c h , t h e ske t ch
in fo rma t ion depends on c o n n e c t i v i t y o f l i n e segments, t h e i r
r e l a t i v e o r i e n t a t i o n s and l e n g t h s . The graph s t r u c t u r e a i d s i n
p r e s e r v i n g t h e s e i n v a r i a n t p r o p e r t i e s du r ing c o r r e c t i o n ( s e e
s e c t i o n 4.4 .2) .
4.1.3 Reduction of a Graph Representing a Line Sketch
Graph reduction reduces search time and also helps to avoid
erroneous interpretations. It involves deletion of noisy
segments, restoration of intersections and smoothing of spurious
deviations.
4.1.3.1 Deletion of noisy segments: -- - -- By noisy segments we refer
to all the line segments which do not belong to a closed loop and
have lengths less than the line width. Here line width is a
feature extracted in the preprocessing stage, which is equal to
the average line width of the input line sketch. This width is
taken as a default threshold. The reduction process is
illustrated in Fig.4.1. Note that point p2 in the figure is open
(a point node connecting only one line segment). If both ends of
a line segment whose length is less than the threshold are open,
that line segment is deleted from the database. This reduction
can be carried out using any length as the threshold. Depending
on the coarseness of the input sketch, user can override the
default threshold.
4.1.3.2 - -- -- --- - - Restoration of intersection points: Some intersection - - --- --
points get distorted due to the combined effect of quantization
error and faulty preprocessing. Two exa~r~ples of such
intersections are given in Fig.4.2a. These points of intersection
can be restored by shrinking unwanted line segments. The
reduction is carried out as shown in Fig.4.2b. The point p12 of
the reduced graph of Fig.4.2b is the mid point of the line
joining points pl and p2 of the original graph.
F i g . 4 . 1 D e l e t i o n o f t h e l i n e sement ' s e g n ' which is an open segment and has t h e l e n g t h less t h a n t h e t h r e s h o l d
F k g . 4 . 2 a T w o e x a m p l e s o f d i s t o r t e d i n t e r s e c t i o n
connect
x l + y l 9( r--
2
Y - x 2 + y 2 2
F k g . 4 . 2 b I n t e r s e c t i o n r e s t o r a t i o n
4.1.3.3 - Deviation -- smoothing: This process joins two line
segments connected at a point with a relative orientation
approximately equal to 180 degrees, provided exactly two line
segments are connected at that point. This smoothing not only
reduces the database but also avoids erroneous recognitions. This
erroneous recognition occurs if the recognition process involves
graph matching. An example of such a situation is shown in
Fig.4.3a, where a approximately triangular figure may be
recognized! as an irregular quadrilateral because it constitutes
four line segments. Fig.4.3b shows its smoothed version.
4.1.4 Graph Classification
The graph classification effectively reduces the search
space. One can observe that each geometric model forms a closed
loop. Hence a search for a matching pattern for a particular
geometric mcdel involves search for a closed loop of path length
equal to the number of sides constituting the model. By 'path
length' we mean the number of segment nodes traversed. For
example, a triangle is recognized in the database only if there
is a closed path constituting exactly three segment nodes. This
search can be reduced if the open paths are not traversed. To
enable this, all the nodes and arcs in the open loops are given a
color which is transparent to the search process. Hence such
paths are not traversed during the search for a geometric model.
The classification of open paths from rest of the graph can be
achieved by repeated application of the following rules till they
fail. An illustration of the classification is shown in Fig.4.4.
oonnmotm
s m o o t h i n g
Fig.4.3 Segments ' s e g l ' and ' s e g 2 ' s u 9 t e n d an a n g l e L n e a r l y e q u a l t o 180
' s s g l 2 ' is smoothed segment o f ' s e g l ' and ' s e g 2 ' .
1
con
F i g . 4 . 4 Classification of open points
( . . . . i n d i c a t e s o p e n c o l o r e d c o n n e c t i o n )
Rulel: A point node is 'open', if it connects only one line
segment.
Rule2: If there is an open point, color the connecting arc so
that it becomes transparent for further search.
All open loops are colored in 'open' color and closed loops
in 'soft' color. The values of the nodes with only these two
colors can be modified during correction. Once a node is
corrected, its color is changed to 'hard' color and vtilues of
hard colored nodes are not modified. A graph with this
classification is shown in Fig.l.6. In this figure nodes S1,
S2, ......, Sm, represent line segments of lengths L1, L2, . . . - , Lm, respectively. The line segments connect their end points
P L , P 2 , ....., Pn, through the arc 'connects'. The nodes al,
a2, ....., a L , indicate the relative orientation between various
connected line segments. Dotted lines with tag 'disconnects'
indicate open paths.
4.2 APPROXIMATE GEOMETRIC MODELS
Geometric i~~odels are defined in the form of rules. These
rules allow approximate matching. An input line sketch matches a
model, if it satisfies all the conditions of a rule describing
the model. For size and orientation independent matching,
conditions In a rule must be governed by the relations that exist
among the line segments constituting the geometric model. To
every specified relation, a threshold for approximate matching is
prescribed. Rules are given priorities to resolve conflicts in
decision ~aking.
4-2.1 Rules Describing Standard Geometric Models
Rule describing a geometric model assumes that the extracted
line features are represented in the form of a graph. A 'path1 in
a graph is described as a continuous traversal of connected nodes
and arcs. A path is said to be a 'closed loop1 if upon traversal,
the starting node is reached without traversing any of the nodes
on the path more than once. A 'path length' is the number of
segment nodes traversed in a path. Using these basic definitions,
rules for geometric models are described as follows.
Rule 1: TRIANGLE: If there is a closed loop of path length
three, then it is a triangle.
Rule 2 : QUADRILATERAL: If there is a closed loop of path length
four, then it is a quadrilateral.
Rule 3: POLYGON: If there is a closed loop of path length 'n',
then it is an 'n' sided polygon.
Rule 4: lSOSCELES TRIANGLE: If there is a closed loop of path
length 3 and two angles between pairs of
connected line segments are approximately
equal, then it is a isosceles triangle.
Rule 5: EQUILATERAL TRIANGLE: If there is a closed loop of path
length three and the angles between three pairs
of connected line segments are approximately
equal, then it is an equilateral. triangle.
Rule 6: RIGKTANCLED TRIANGLE: If there is a closed loop of path
length three and the angle between a pair of
line segments is approximately equal to 90
degrees, then it is a rightangled triangle.
Rule 7: RIGHT-ISOSCELES TRIANGLE: If there is a closed loop cf
path length three and two of its angles between
pairs of its connected line segments are
approximately equal and the angle between the
third pair of connected line segments is
approximately equal to 90 degrees, then it is a
right-isosceles triangle.
Rule 8: TRAPEZIUM: If there is a closed loop of path length four
and a pair of opposite sides are approximately
parallel, then it is a trapezium.
Rule 9: PARALLELOGRAM: If there is a closed loop of path length
four and both pairs of alternate line segments
are approximately parallel, then it is 2
parallelogram.
Rule 10: RECTANGLE: If there is a closed loop of path length four
and both pairs of alternate line segments are
approximately parallel and an angle between a
pair of its connected line segments is
approximately equal to 90 degrees, then it is a
rectangle.
Rule 11: SQUARE : If there is a closed loop of path length four
and both pairs of alternate sides are
approximately parallel and consecutive sides
are approxinlately equal and subtend an angle
approximately equal to 90 degrees, then it is a
square.
Rule 12: RHOMBUS: If there is a closed loop of path length four
and both pairs of alternate sides are
approximately parallel and consecutive sides
are approximately equal, then it is a rhombus.
4.2.2 Thresholds for Approximation
The patterns extracted from the hand-drawn line sketches
match only approximately with the description of the stored
geometric models. The allowable ranges of discrepancies in
angles and sides are decided by the prescribed thresholds. The
thresholds are defined as follows.
1.Threshold for absolute orientation (Ao): A line is
approximately vertical or approximately horizontal if it
deviates from the horizontal or vertical axis by less than
a prescribed constant Ao.
2. Thresholds for equality ( Eo ) : Two quantities are --
approximately equal if the difference between them is less
than a value E times the smaller of the two quantities. 0
The default value of Eo is 10%.
3.Threshold - - of parallelism (P ) : Two unconnected sides arc 0
approximately parallel if they subtend an angle smaller
than a threshold P given by 0'
where A. is the threshold for absolute orientation, H is the
average distance between the line segments under consideration
and L is the length of the shorter of the line segments.
These three thresholds are selected because of the following
observations:
(i) The hand--drawn lines representing vertical or horizontal
lines invariably deviate from the vertical or horizontal
axis. Such deviations must be tolerated. The range of
allowable deviation is specified by the threshold for
absolute orientation Ao.
(ii) A difference in line lengths looks obvious if the lines are
short. But the same difference becomes insignificant if the
lines are longer. Fig.4.5 shows an example of such a
situation. The same argument is true in case of angles.
(ii) The accuracy of representation of an angle subtended by two
unconnected lines tends to be low, if they are separated by
a large distance. This situation is clarified in Fig.4.6.
Secondly, if parallel lines to be drawn are long, one can
draw them with better accuracy than if they were short.
This is because a small angular deviation becomes visually
more and more obvious as the line lengths increase as shown
in Fig.4.7. Hence the threshold selected should be
proportional to the distance of separation and inversely
proportional to length of the lines.
4.2.3 Priority Ordering of the Rules
Some geometric models are special cases of some other
geometric models. For example, an equilateral triangle is a
special case of an isosceles triangle, where the third side is
equal to the two equal sides. A square is a special case of a
quadrilateral, a parallelogram, a rectangle and a rhombus. This
- - two lines of unequal lengths
-.--.--.-.-.--- ----__
- - - . - --.--. .-.--.LI--L.I . - - - -_. - ._. *
two lines o f nearly equal lengths
F i g . 4 . 5 Both set o f lines have t h e s a m e difference in lengths.
- two non parallel lines
two nearly parallel lines
F i g.4.6 The same pair of lines at different distances of separation.
two nearly parallel lines
-- --- .--.- -- -- two non parallel lines
F i g . 4 . 7 Two pairs o f lines of the s a m e relative inclination.
situation leads to conflicts in decision making. For example, an
input pattern approximating a square (or an equilateral triangle)
satisfies more than one rule and a conflict arises as to which
model does it approximate. To avoid such conflicts, rules are
given priorities. If more than one rule is satisfied, the model
corresponding to the rule with the highest priority is selected
as the recognized model. These priorities cannot be given
arbitrarily. For example, if the model for a rectangle is given
priority over that for a square, none of the square patterns will
be recognized as squares.
Every time a rule succeeds, only the corresponding matched
pattern is corrected. To extract all similar patterns existing in
the database, the rule that matches a pattern has to undsrgo
repeated execution until it fails. Thus the priority relation
must be reflexive, which means a rule has priority over itself. --
Secondly, if a particular model gets a priority over another
model, the latter model can never get a priority over the former.
Hence the priority relation is antisymmetric. Finally if rule 'a1
gets priority over rule 'bl, and rule 'bl gets priority over rule
'cl, then rule 'a1 has priority over rule 'cl. This suggests that
the priority relation is transitive. These reflexive, transitive
and antisymmetry properties make a priority relation, a Partially - - -
Ordered Relation. -- --
For simplicity of representation, let each rule be
represented by its number. Then {I, 2,3,4,5,6,7,8,9,10,11,12} is
the set of rules defining all the geometric models. A square
pattern satisfies the set {2,3,8,9,10,11,12}. Thus a square can
be viewed as partitioning the set of rules into
{ {2,3,8,9,10,11,12}, 1,4,5,6,7}. Similarly we have partitions
of other patterns as follows:
RECTANGLE: { {2,3,8,9, lo}, 1,4,5,6,7,11,12}
RHOMBUS: { {2,3,8,9,12}, 1,4,5,6,7,10,11}
PARALLELOGRAM: { {2,3,8,9}, 1,4,5,6,7,10,11,12}
TRAPEZIUM: { {2,3,8}, 1,4,5,6,7,9,10,11,12} -
QUADRILATERAL: { {2,3}, 1,4,5,6,7,8,9,10,11,12)
EQUILATERAL TRIANGLE: { {1,3,4,5}, 2,6,7,8,9,10,11,12}
RIGHT-ISOSCELES TRIANGLE: { {1,3,4,6,7}, 2,5,8,9,10,11,12}
ISOSCELES TRIANGLE: ( {1,3,4}, 2,5,6,7,8,9,10,11,12}
RIGHTANGLE TRIANGLE: { {1,3,6}, 2,4,5,7,8,9,10,11,12} -
TRIANGLE: { {1,3}, 2,4,5,6,7,8,9,10,11,12}
POLYGON: {1,2,3,4,5,6,7,8,9,10,11,12}
These partitions ,of the partially ordered rules can be
represented in the form of a Hasse diagram[53]. This is a diagram
where an arc represents a relation and a node represents a
parti.tion. Larger the partition, lower is the node level. The
Hasse diagram for the set of rules and partitions made by
geometric models is shown in Fig.4.8. One can observe that rules
with more stringent conditions make larger partitions. The rules
describing square, equilateral triangle, etc. have large
partitions, whereas the rule describing in general a polygon
makes no partition at all. Using the specificity ordering
resolution strategy[ll], the rule which makes the biggest
partition is given the highest priority because under the given
circumstances it is the most specialized rule. A rule with zero
or the smallest partition is given the lowest priority. In
F i g . 4 . 8 Hasse diagram showing r u l e priorities.
(The numbers a t t h e nodes correspond t o t h e r u l e numbers)
between these two bounds, priority is distributed in accordance
with the size of the partition made by a rule. From the Hasse
diagram, we observe that there is no unique lower bound. i.e.
there is no largest partition. The partitions of equilateral
triangle, right-isosceles triangle and square are the three lower
bounds of the diagram. Since these models are unrelated, no
matter what priority is forced over them, the decision process is
not hindered as long as the existing relations are maintained.
One such forced priority relation, which preserves the priori-ties
specified in the Hasse diagram is,
Square - - > Rectangle - - > Rhombus - - > Parallelogram - - > Trapeziu~n
- - > Quadrilateral - - > Equilateral Triangle - - > Right Isosceles
Triangle - - > Isc-sceles Triangle - - > Triangle - - > Polygon.
Here the symbol ' - - > I is read as 'has the priority over'. This is
a totally ordered relation and can be easily implemented.
4 .3 CONTEXT MODULE FOR L I N E SKETCHES
An interpretation of an input figure is checked for its
contextual consistency. If the interpretation is found to be
inconsistent, a search for an alternative interpretation is
made. If no context is specified, all interpretations are
assumed to be contextually consistent.
The context module can also check for the user specified
conditions. This provision can be effectively used for
recognition and correction of highly distorted sketches. This may
provide higher freedom to the user for drawing input line
sketches. Consider the case of the Fig.l.1. The machine can view
the pattern as four triangles connected at their vertices or two
quadrilaterals one within the other. If the machine corrects the
sketch as a combination of triangles, the user may find the
output distorted, if he erere to view the figure as two
quadrilaterals one within the other. To overcome such problems, a
user may specify his views in the context module.
4.3.1 Structure of the Context Module
The context module is represented in a form similar to
frames, with context as the frame name and various slots as
geometric models. Values of these slots decide whether a queried
model is consistent or not. Every new context, which is a frame
by itself fills a 'default' slot of the general context frame.
When the contextual consistency of an interpretation is to be
checked, the query starts at the present context frame. If the
frame contains no information about the pattern queried, the
default value is obtained from the frame next in the hierarchy.
Fig.4.9 depicts the structure of the context module.
4.4 CORRECTION OF HAND DRAWN LINE SKETCHES
Every figure, with contextually consistent interpretation,
is corrected in accordance with the relations governing the model
with which it matched. To preserve the overall sketch appearance,
the relations that exist between the figure to be corrected and
its connected neighbors are also considered in the correction
process. The correction process basically consists of two stages.
(G2- CONTEXT
sketch a
F i g . 4 . 9 S t r u c t u r e o f t h e context m o d u l e .
Tho first; s t age i ~ l v o l v e s s e l e c t i o n of r e f e r e n c e l i n e segment and
t h e second s t a g e invo lves c o r r e c t i o n of recognized geometric
f i g u r e w i t h r e s p e c t t o t h e s e l e c t e d r e f e r e n c e .
4 . 4 . 1 S e l e c t i o n of a Reference Line Segment
Various r e l a t i o n s among t h e l i n e segments c o n s t i t u t i n g a
geomet r ic model can be expressed i n t e rms of r e l a t i v e
o r i e n t a t i o n s of l i n e segments and t h e i r l e n g t h s . Unless any one
of t h e l i n e segment i.s used a s t h e r e f e r e n c e , r e l a t i v e
o r i e n t a t i o n remains undef ined. Hence t h e f i r s t s t e p i n t h e
c o r r e c t i o n p roces s i s t o s e l e c t a r e f e r e n c e l i n e segment. During
t h e s e l e c t i o n of a r e f e r e n c e l i n e segment, we t r y t o s a t i s f y and
p r e s e r v e t h e neighborhood r e l a t i o n s o f t h e p a t t e r n t o be
c o r r e c t e d . The s e l e c t i o n of r e f e r e n c e l i n e segment i s governed by
t h e fo l lowing s e t o f r u l e s .
11
i . S e l e c t a l i n e segment which is a l r e a d y c o r r e c t e d a s t h e
r e f e r e n c e .
ii. S e l e c t a l i n e segment which has a d e f i n i t e r e l a t i o n
w i t h i t s ne ighbors , a s t h e r e f e r e n c e .
iii. S e l e c t a l i n e segment which has a d e f i n i t e r e l a t i o n
w i t h t h e co - o r d i n a t e a x i s , a s t h e r e f e r e n c e .
i v . S e l e c t a l i n e segment a s t h e r e f e r e n c e .
1 I
T h e symbol I] . . . 1 1 i n d i c a t e s t h a t t h e r u l e s w i t h i n t h e
symbol a r e totally orde red and t h e d e c i s i o n made by t h e e a r l i e s t
r u l e s a t i s f i e d i n t h e r u l e l ist is cons ide red . I f a r u l e wi th t h e
h i g h e s t e x i s t i n g p r i o r i t y i s s a t i s f i e d by more t h a n one l i n e
segmerbt of the recognized figure, any one of them is selected as
the reference. In rules 2 and 3, check for definite relation
involves a cheek for the existence of approximately parallel or
approximately perpendicular relations.
4.4.2 Correction sf Individual Geometric Figures
Correction is made with respect to the selected reference
segment and in accordance with the relations governing a matched
model. All the relations of the model are defined in terms of
connectivity, length and angles. The connectivity is never
destroyed because arcs defining connectivity are not updated.
Even if the coordinate value of a 'point' node is updated during
correction of any one of the line segments, all the other Line
segments connecting that poi.nt also refer to the updated point.
This is because in a graph structure a relational arc refers to a
node and not to the value of the node. Relations like relative
urientation cr line lengths are also preserved, because
corrective shift given to any point is such that the rnodificat:ion
in line length and relative orientation is always within the
specified range of approximation.
Following geometric formulae are used for correction.
1. If a line segment (Xl,Yl), (X2,Y2) is rotated by an angle ' B '
as shown in Fig.4.10, then the rotated end point (X3,Y3) is given
by
F i g . 4 . i Q R o t a t i o n o f a s t r a i g h t l i n e .
(Xi Y % l ? X-ax la
F i g . 4 . 1 1 L o c a t i o n o f a c o n n e c t e d
segment end p o i n t ( X 3 , Y 3 ) .
2. If a line segment (X2,Y2),(X3,Y3) of length L2, is connected
to a reference line segment (Xl1Y1),(X2,Y2) of length L1, and if
they subtend an angle 'or' as shown in Fig.4.11, the point (X3,Y3)
is given by
Up to this stage, care is taken to ensure that every
corrected individual figure of the input sketch is an
approximation of its original representation. But cu~nulative
effect of this approximation may lead to increase or decrease of
the overall sketch size, and sometimes it may introduce
distortions in the overall appearance. To suppress this
cumu:Lati.ve effect, approximately equal sides and angles are
averaged out before correction. Considering the above described
formulations and constraints, the correction procedures for
various geometric models are described as follows:
S~uare: -
1. Select a reference line segment from the line segments
constituting the square to be corrected.
2 - [ I
i. if the reference segment is already corrected, and no
other point is corrected, generate a square of the
size of the reference segment.
ii. If in addition to the reference segment one more
corner point is already corrected, shift the
uncorrected point, symmetric to the already corrected
point.
i.ii, Generate a square of the size equal to the average
side length of the recognized square figure.
1 3
Given a reference line segment(X1, Yl), (X2, Y2) of a square,
the corner point (X3,Y3) opposite to (X2,Y2) is given by
Rectancvle' 3-2
1. Select a reference line segment from the line segments
constituting the rectangle to be corrected.
2. E l
i. If the reference segment was already corrected and no
other point is corrected, average the uncorrected
opposite sides and generate a rectangle.
ii. If in addition to the reference segment one other
point is corrected, correct the remaining point
symmetric to the already corrected point.
iii. Generate a rectangle with sides equal to the average
of the opposite sides.
Given a reference line segment (Xl1Y1),(X2,Y2) of length L1
of a rec-tangle, the corner point (X3,Y3) opposite to (X2,Y2), is
given by
P a r a l l e l o ~ r a n : -- -
1. S e l e c t a r e f e r e n c e l i n e segment from t h e l i n e segmen-ts
c o n s t i t u t i n g t h e pa ra l l e log ram t o be c o r r e c t e d .
2 . [ I
i. I f t h e r e f e r e n c e l i n e segment was a l r e a d y c o r r e c t e d
and no o t h e r p o i n t i s c o r r e c t e d , g e n e r a t e a
pa ra l l e log ram wi th s i d e s equa l t o t h e average of t h e
unco r rec t ed o p p o s i t e s i d e s and a n g l e s equa l t o t h e
average of t h e o p p o s i t e ang le s .
ii. I f i n a d d i t i o n -to t h e r e f e r e n c e l i n e segment one
o t h e r co rne r p o i n t i s c o r r e c t e d , s h i f t t h e
unco r rec t ed p o i n t symmetric t o t h e a l r e a d y co r r ec t ed
p o i n t .
i i i . G e n e r a t e a pa ra l l e log ram of s i d e s equal t o -the
average of t h e o p p o s i t e s i d e s and a n g l e s equa l t o t h e
average of o p p o s i t e ang le s .
Rhombus :
1. S e l e c t a r e f e r e n c e l i n e segment from t h e l i n e segments
consti tut : i .ng t h e rhombus t o be c o r r e c t e d .
2 . r1
i. I f t h e r e f e r e n c e l i n e segment was a l r e a d y c o r r e c t e d
and no o t h e r p o i n t i s c o r r e c t e d , g e n e r a t e a rhombus
of t h e s i z e of t h e r e f e r e n c e l i n e segment and angles
equal to the average of the opposite angles.
ii. If in addition to the reference line segment one
other corner point is corrected, shift the
uncorrected point symmetric to the already corrected
point.
iii. Generate a rhombus with sides equal. to average of
the four sides and angles equal to average of the
opposite angles.
r I Given a reference line segment (Xl, Yl), (X2,Y2 ) of length L,1
and the segment to be corrected(X2,Y2),(X3,Y3) of length L2, the
corner point (X3,Y3) opposite to (X2,X2) that constitutes a
p&rallelogram or a rhombus is given by
X3 = X2 - L2/LI((X2-Xl)cos(a ) + (Y2-Yl)sin(@ ) )
~3 = X2 - L2/1,1((Y2-Yl)cos(@) - (X2-Xl)sin( 8 ) )
where ( t?3 1 is the angle between the reference side and the side
to be corrected. In case of rhombus the ratio L2/L1 is 1.
Trapezium: - - - -- - - -
1. Select one of the parallel sides as the reference
segment.
2. [ I
i. If the non-parallel sides have approximately equal
inclination with respect to the reference segment,
make the trapezium symmetric so that ;he side
opposite to the reference segment is placed at the
average original height and parallel to the reference
segment.
ii. Correct the sides to the original angle, so that the
side opposite to the reference segment is strictly
parallel and placed a.t a distance equal to average
original distance from the reference segment.
Quadrilateral: - - .
1. Select a reference line segment from the line segrr~ents
constituting the quadrilateral to be corrected.
2 . 111
_i. If -the remaining sides have definite relation with
the reference line segment, correct it accordingly.
ii- Correct the sides to their original angles and
lengths.
I I
Equilateral triangle: -- - -- - - --
1. Select a reference line segment from the line segments
constituting the equilateral triangle to be corrected.
2 . E l
i. If the segment is already corrected, make an
equ$..lateral triangle of side equal to the reference
segment.
ii. Generate an equilateral triangle of side equal to the
average of the three sides.
C 1
Isosceles trianqle: .
7 . Select a reference line segment from the line segrrients
constituting the isosceles triangle to be corrected.
2 - I1
i. If the reference line segment is one of the equal
sides and is already corrected, make an isosceles
triangle with the original angle.
ii. If the reference segment is not corrected, make the
isosceles triangle with two equal sides of a length
equal to the average length of the approximately
equal sides.
iii. If the reference side is the side connecting the
equal sides, make the isosceles triangle with two
equal sides of a length equal to the average length
of the approximately equal sides.
I I
Right-isosceles triangle: - - -- -
1. Select a reference line segment from the line segments
constituting the right isosceles triangle to be
cc2rrectc:d.
2 . [ I
i, If the reference side is one of the equal sides and
-is illready corrected, generate a perpendicular line
segment with length equal to the reference line
segment.
ii. If the reference line segment is not corrected,
generate perpendicular sides of length equal to the
average length of the approximately equal sides.
iii. If the corrected side is the side opposite to the
right angle, make right .- isosceles triangle of two
equal sides of length equal to the average length of
the approximately equal sides.
Rilght - --- ---- ange triangle:
1. Select a reference line segment, which subtends the right
angle.
2. Find the third point as a rectangle point, with connected
segment length equal to the original length of the
segment to be corrected.
Triangle: -- --
1. Select a reference line segment from the line segments
constituting the triangle to be corrected.
i. Correct the remaining sides with respect to their
eor,rec.ted neighbors, with which they have definite
relation.
ii.. Correct the sides with respect to the reference
segment.
I I
Polv~on: .-.a% -
1. Select a reference line segment from the line segments
constituting the polygon to be corrected.
2. [ I
i. If the polygon is a regular polygon, correct the
remaininy line segments with reference to the
selected reference, so that all angles and sides are
equal.
i.i. Correct the rest of the line segments .to their
original angles.
[ 1
4.5 CONTROL STRATEGY FOR RECOGNITION AND CORRECTION STAGE
The overall control structure for the recognition and
corr@ction stage is presented in Fig.4.12. The features extracted
in the preprocessing stage are initially represented in the form
of a graph. Then the graph is reduced to obtain an optimal set of
features. T5is reduction process is a sequence of operations like
deletion of the noise segments, removal of the duplicate
segments, smoothing of the bends within the user defined or
default threshold and classification of the feature nodes. The
reduced graph is then processed to obtain the relative
orientation information.
The graph representation forms the relational data base
which answers queries from various geometric models. At this
stage the rule base starts querying the data base for various
conditions. Starting from the highest priority rule, the rules
which are satisfied are assumed to be matched. A figure which
satisfies a rule is corrected, only if the model is contextually
consistent. After every correction, the rule base is reset and
PRIMARY GRAPH REPRESENTATION ----- -.---- ------a"--- --- --
1 C GRAPH REOUCTION AND SMOOTHING i
I PHESEBVATION OF ANGLES AND CLASSIFICATION OF GRAPH I
! REPRESENTATION 1 {RELATIONAL / DATA BASE] 1-JOUTPUT
r ---------
RULE BASE J--imTRoL
STWATEGY CONTEXT
I -- i _--.- 1 - 4 1 --- ---
1 I CDRRECTIQN /_-- -7 i
I PROCEDURES f
F B g . 4 . 3 2 C o n t r o l s t r u c t u r e o f t h e r e c o g n i t i o n
and c o r r e c t i o n stage.
queries resume. When all the rules fail to satisfy, search is
made for the uncorrected points. The uncorrected points in the
closed loop are corrected first and then the open points are
corrected. The open points are corrected only at the end because
any corrective shift given to the open points does not affect
other parts of the sketch.
4.6 IMPLEMENTATION OF RECOGNITION AND CORRECTION STAGE
The implementation of the recognition and correction stage
is dcne in PROLOG. PROLOG is a declarative language, where the
various relations existing between the line segments can be
declared as facts. A s PROLOG uses resolution, all decisions can
be obtained through simple queries over the stored facts. As the
contextual consistency check involves search for an alternate
solution, a backtracking of the decision process is essential.
This bzclrt-racking is inherent in PROLOG.
I n the implemetation of the graph structure, all the feature
nodes are treated as arguments of the relations by which they are
bound. The relations in turn represent the arcs in the graph
structure. An examp1.e of the graph structure and its PROLOG
translation is shown in Fig.4.13. This PROLOG declaration also
acts as a rel-ational database, which can answer the queries made
by the rules.
The reducti.on procedures explained in section 4.1 are
implemented in terms of set of conditions, which query the graph
structure. An example of PROLOG declaration of 'deviation
smoothing' is given in Fig.4.14. Node coloring is achieved by a
length Isego. 721 . length ( s ~ g i , 683 . length (seg2. 182) . l ongth tseg3, 9-71 .
connects (segl. p3, p4) . connects lseg2. p4, p2) . connects (seg3, p i , p31 . connects bego, p i , p21 . angle (segO, seg2.491 . angle (segO. seg3, 130) . angle (seg3. segl , 49) . angle (segl, seg2.1301 .
F i g . 4 . 1 3 G r a p h r e p r e s e n t a t i o n o f a p a r a l l e l o g r a ? W
and i t s psolog t r a n s l a t i o n .
----"------..----- -- -- ----- ------------.--.
=/HIS I S AN ITERATIVE GOAL HHICH SUCCEEDS WHEN 7
f ALL THE LINE SEGMENTS ARE CHECKEDR/ 1
I
deviatfon2maoth: - connected ISi, A, €31, connected (52, A, C) ,
c \= e. not lconnectedto (A, B, C1,
1 find_janglo (8, A. C, Tan, Cos, Sin!, I 1 i ang-threshold (Thresho ld , !
L =< Threshold, ppocess [Si, S2, A, 6, Cl,
F a i l .
1 JXSMQOTHING COMPLETED K/ !
dev i a t tongmoath .
I I
i urn FACTS ARE RETRACTED AND MODIFIED FACTS ARE ASSERTED I , i i INTO THE GRAPH. SEGMENTS S i OF LENGTH L i AN0 S2 OF I
i I
! LENGTH L2 ARE SMOOTHENED TO S l OF LENGTH b I t L 2 N/ 1
I I
1 I process (Sl, S2, A, B, C ) : -
! . r e t r a c t ( length (St, L i ) 1 , r e t r a c t (lclngth (52. L21 I . L 3 is I L I +L21 , asser ta ( length (St. L3) 1. asser ta Icr~nneets (Si, 8. Cj . r e t r a c t (cannectys (Si, A, B) , r e t r a c t (connects (S2, A. CJ I . r e t r a c t (s (S211. r e t r a c t (p (A1 1 , r e t r a c t ( c o q r d (AI , _) 1 , I . .
F B g . 4 . 3 4 P R O L O G c o d e f o r d e v i a t i o n s m o o t h i n g
simple process 3f assertion of a fact. For example, an 'open'
color is given to a point 'p10' by asserting the fact
The classification ~rocedure involves repeated application
of the rules give in section 4.1. I.ts PROLOG translation is given
in Fig.4.15. A rule defining a geometric model is translated Ln
the form of a group of queries, where the rule head acts as the
main goal and all the queries are declared as its subgoals. For
example, the rule for triangle is declared in the form:
connected(Seg2,R, C),
Seg2 \ = Segl.
When thesc? four queries are responded affirmatively by the
database, therc exists n triangular pattern (A,B,C) in the
cla tabasc.
Becauss PROLOG uses depth first sequential search, it is
advantageous to reduce the number of queries to arrive at the
results. We observe that in most cases, rules defining variccs
members of the same family of models repeatedly make the queries
which are colnmoll to the whole family. This is a computationally
expensive exercise, especially in sequential processing systems.
This can be reduced to a single query by grouping the common
queries under a separate goal which has the priority over the
corresponding class of rules. For example, all the rules of a
. . . " . ' -.-...--
i / X THIS RECURSIVE CALL CLASSIFIES THE NODES 1
1 WHICH ARE ONE-CONNECTED AS 'OPENa AND REST OF THE NODES AS 'SOFT' U/ I
I c l a s s i f y : - I P (X), t i s ing ly jcar~nec ted (XI, I
! r e t r a c t Ip IX1 1,
I asscr t a (open (XI I , I r e t r a c t (connects (S, X, YJ 1 , j asser ta (disconnect IS. X. Y j 1, I
i c l a s s i f y I Y j . I
i c la s s i fy I r0 : - I
i P (Yl , I s i n g l y j o n n e c t e d [ X I , [ classify (Y l . I I j s , l a s s i f y I J : - i i r e t r a c t lp [XI I , I 1 asser ta ( s o f t [XI I , I
i classify IV1 .
i I singly- connected (XI : - , I not (double_~onnsc ted (XJ 1 .
: double-car~nected (XI : - I
i connectd (St, X. Y j , I
connected (52, X, Z),
F i .g.4.1.5 P R O L O G code f o r g r a p h classification
triangle, search for the existence of a three segment long closed
loop. To avoid this repeated search, these queries are grouped
under a separate rule and the rule is given a priority over other
rules for triangle and the result of this search is then passed
over to all other rules of triangles. As PROLOG gives higher
priority to rules which are earlier in the list of rules of one
kind, if necessary, priority can be given to a particular rule
over the other just by asserting the rule before the other rule.
The correction procedures are expressed in the form of
mathematical formulations as explained in section 4.4.2. These
formulations can be directly translated into PROLOG. One such
example is shown in Fig.4.16. PROLOG shows inefficiency when
correction procedures are to be executed. This is because,
instead of treating complete procedure as a single goal, PROLOG
treats every mathematical operation as a separate goal, and
carries out an inherent and expensive exercise of storing and
retraction of -the goal environment. Hence, though PROLOG is
effective in the implementation of graph representation and rule
base, it is quite inefficient while carrying out procedural jobs
like correction.
In cases of finding alternate solutions, we felt the need
of the Communication Sequential Processes (CSP) feature of
guarded command. This feature, if included in PROLOG may affect
the pure logic declaration in PROLOG, but it may be less harmful
than the cut feature of PROLOG.
l______l______l__ ____-_ "_l_-~.-ll-.I_-___ -^ -------- r . - ------- ____- 7 ! \
/n THE PROCEDURE FINDS THE RECTANGLEPOINT NEAREST TO , THE POINT TO BF CORRECTED %/ 1
!
1 ! I / % If the p o i n t t 3 be c o r r e c t e d is 'hard'. the goal is t r u e % / i I : make-pec t IC, B, A, L-21 : -- j
hard ( A ] . I I ! . i
/ % (C, Bj is the reference l ine segment and A is the i p o i n t t o be coasected a t a distance L2 a /
r8etract (co-ord (A. X, Yj 1 , c o j r d (C, X i , Yl) , c o g r d 18. X2, Y21, fdist ( X i , Y i , X2, Y2. L11, X3 is X2 t. (Y2 - Y l J S L2 / L i , Y3 i s Y2 - (X2 - X l j % L2 / L i . X4 is X2 - fY2 - Yij % L2 / L i , Y 4 i s '42 - 1x2 - Xi) H L2 / L l , ! . f d i s t IX, Y. X3. Y3. D l ) . f d hst (X. Y. X4, V4. D21 . if,bist (Di, X3, Y3. D2, X4, Y4, A1 .
/% Cl is distance hetween (Xi ,YI ) and (X2,Y21 E/ ! I I
f d ist (X i , Y 1. X2. Y2, 01 : - !
e is 1x1-X~J w (XI-X~J + (Y 1 - ~ 2 j w (Y 1 - ~ 2 J . I
s q r t (L. Dl . I
/ # C o r r e c t the p o i n t t a the nearest rectangle poin t %/ I I
Dh >= 02, asserta (co_prd (P. X, Y) 1 .
i f-dist (BULL, A. 8. MAC, INN. OUT. PI : -
F P g . 4 . 3 6 C o r r e c t i o n o f a c o r n e r p o i n t to t h e n e a r e s t r e c t a n g l e p o i n t .
4 . 7 RESULTS
The recognition and correction stage explained in this
chapter takes line features of a sketch and produces a geometric
representation. Performance of the recognition and correction
stage can be examined through a set of typical inputs and the
corresponding drafted outputs. Fig.4.17 exhibits variations in
the drafted versions of a hand-drawn quadrilateral with respect
to values of thresholds for approximation. Fig.4.17b is the
drafted version with default thresholds. Here, the sketch i s
recognized as a parallelogram and corrected accordingly. In
Flg.4.l7c, the drafted sketch is a rectangle. This is because of
a large, user specified threshold for absolute orientation (A = 0
,, 0 0 1. With the same A 0'
when the threshold for equality E is 0
raised to 2 5 % , the input is recognized and corrected as a
square(Fig.4.17d). One more example depicting variations in a
drafted sketch with user specified thresholds is shown in
Fig.4.18. Here, an irregular triangle is corrected to an
equilateral triangle, when threshold for equality is raised to
2 5 % . In all the above cases the sketch size is maintained within
the thresholds specified by the user.
In Fig.4.19, the effect of user specified contexts on a
drafted sketch is illustrated. Fig.4.19b is the drafted version
with default context, where every recognition is contextually
collsistent. Here, the sketch is recognized as a combination of a
trapezium and four triangles. Fig.4.19~ is the drafted version
when the cantext specified is "presence of only a quadrilateral
with threshold for equality 15%". Here, the inner quadrilateral
a. INPUT GRAYTONE' IMAGE OF b. UIIAF'I'EL) V:EIISION WITH A HAND-DRAWN QUADRILATERAL- DEFAULT THRESHOLDS
c. DRAFTED VERSION WITH d. DRAFTED VERSION WITH
USER SPECIFIED THRESHOLD USER SPECIFIED THRESHOLDS
OF ABSOLUTE ORIENTATION E = 2!58ANDAo = 50
= 50 DEGREES. DBGREES . A.
Fig.4.17 BEHAVIOUR OF DRAFTED OUTPUT OF A QUADRILATERAL WITH THE VARIATION IN SPECIFIED THRESHOLDS.
. .----
a. INPUT GRAYTONE SMAGE OF A TRIANGLE
~ ~-
b. PLOT OF THE FEATURES EXTRACTED IN PRE- .PROCESSING
c. DRAFTED VERSION OF THE d. DRAFTED VERSION WITH TRIANGLE WITH DEFAULT USER SPECIFIED THRESHOLD THRESHOLDS. OF EQUALITY Eo = 25%.
1 ~EIIAVIOLJR OF DRAFTED OUTPUTS OF A TRIANGLE WITH THE , VARIATION IN SPECIFIED THRESHOLDS.
..' , ?(: : "?.i.! '- , .;. - ,%;- :.: . . ,. .. . ' l i . . '7' .. _ . . . . , _. . . r , . ' . , : , , ( , . , '..'.i ': : ' ' . . . . I . . .. . .:' . * > . . . . . . ' . . . . . 'I' .
a. PLOT OF THE FEATURES EXTRACTED IN THE PREPROCESSING STAGE.
b. DRAFTED VERSION WITH DEFAULT CONTEXT.
d . DRAFTED'. VERSION WITH A c. A DRAFTED VERSION WITH A CONTEXT OF PRESENCE OF
CONTEXT OF PRESENZE OF ONLY TRIANGLES. ONLY QUADRILATERAL.
. .
Fig.4.19 VAR.IATIONS IN DRAFTED OUTPUTS WITH CONTEXT.
is corrected as a square and rest of the line segments are
corrected with reference to this square. In this context none of
the triangles are recognized. Similarly, when the context is
"presence of only triangles", the drafted version of Fig.4.19d is
obtained. Mere, one can observe that the quadrilateral is without
any regular shape.
In the absence of models for recognition of alphanumeric
features, the system fails to draft the sketch with alphanumeric
data. One such example is illustrated in Fig.4.20, where the line
features of character ' R ' are left uncorrected. If the system
were to be used for drafting the sketches with alphanumeric data,
corresponding object models must be included in the present
system. These models must be given priority over geometric models
so that the sketches with dimensional specifications can be
corrected according to specified dimensions. Under such
conditions, the only change needed in the correction procedure is
ta override the default thresholds by a 'zero' value.
The corrected version of a sketch is in the form of feature
points with their coordinates and connectivity specified. This is
usually referred as 'soft copy'. It can be observed that the
input image, which would have taken as many bytes of memory as
the number of pixels in the image, for its storage, now needs
only a few bytes of memory. Secondly, this compressed code can be
easily updated because it is in a machine-recognizable form.
Finally, the small size of the data allows efficient
communication and sketch duplication.
- . --
a . INPUT GRAYTONE IMAGE I
.. - - - - - - - - b. PLOT OF *HE-FEATURES
EXTRACTED I N THE PREPRO- CESSING STAGE.
- - - . .
c . DRAFTED VERSION WITH DEFAULT THRESHOLDS.
. . ---
Fig.4.20 A SKETCH WHICH I S INCOMPLETELY DRAFTED I N THE ABSENCE OF ALPHANUMERIC MODELS.
s 4r
CHAPTER V
CONCLUSION
Various issues involved in computer drafting of hand-drawn
geometric line sketches, are explored. While solutions are
sucjqested for thc basic issues, the issues that are involved in
the developmenJc of a full-fledged drafting station arc? not
exarnil-led. The described system takes a digitized line sketc!? as
the input. This input graytone image is binarized and line
features are extracted without 'thinning'. The extracted features
are then recogrlized and corrected. Recognition of geometric
figures is done through approximate matching of the figures with
various geometric models. Provision is also made to check the
contextual consistency of a recognized model. The output is a
drafted soft copy of the input line sketch which can be
efficiently stored, duplicated or communicated over a Wide Area
Network (WAN ) . The proposed binarization scheme uses local thresholds and
it exhibit:; good noise rejection properties. The scheme also
provides local line vectors at all edge points. It generates low
thresholds in dark(object) regions, so that formation of holes is
suppressed. But the same effect may lead to misclassification of
pixels in low illumination regions.
The line extraction and segmentation scheme uses the line
width and the run length constraints. The algorithm assumes that
the input is strictly a line sketch. It skips all the dark
regions present in the input sketch. Irrespective of the type of
She input sketch, the preprrressing stage extracts lines in the
form of intersection and deviation points. Hence all curves are
piecewise linearized.
The line features extracted are represented in the form of a
graph. New techniques for reduction and classification of a graph
and representation of approximate geometric models in the form of
rules are proposed. As the process is independent of the
o~ientation or translation of the figures, no effort is made to
preserve the overall orientation of the sketch.
To make the system more versatile, it can be modified to
accept line sketches with dimensions specified on them. It is
essential. that in such a system, the features characterizing
dimensional information be separated from those characterizing
the sketch information. The system must also have an additio~~al
stage for recognition of alphanumeric data to understand the
dimensions specified.
REFERENCES
1. Nobuyuki Otsu, "A threshold selection method from gray level histograms", IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-9, No.1, pp.62-66, January (1979).
2. J.R.Ullmann, "Binarization using associative addressing", Pattern Recognition, Vo1.6, pp.127-135, (1974).
3. L.T.Watson, K-Arvind, H.W.Ehrich, R.M.Haralick, "Extraction of lines and regions from graytone line drawing images", Pattern Recognition, Vo1.17, No.5, pp.493-507, (1984).
4. J.S.Weszka, A.Rosenfeld, "Histogram modification for threshold selection", IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-9, No.1, pp.38-52, January (1979).
5. R.W.Smith, "Computer processing of line images: A survey", Pattern Recognition, Vo1.20, No.1, pp.7-15, (1987).
6. K.Ramachandran, "Coding method for vector representation of engineering drawings", Proceedings IEEE, Vo1.68, No.7, pp.813-817, July (1980).
7. J.J.Sebok, L.E.Roemer, G.S.Malindzak Jr., "An algorithm for line in.tersection identification", Pattern Recognition, Vo1.13, No.2, pp.159-162, (1981).
8. Z.M.Wojcik, "A natural approach in image processing and pattern recognition: Rotating neighborhood technique, self adapting threshold, segmentation and shape recognition, Pattern Recognition, Vo1.18, No.5, pp.299-326, (1985).
9. D.H.Ballard, C.M.Brown, "Computer Vision", Prentice Hall Inc., Englewood Cliffs, New Jersey, pp.65-70, (1982).
10. T. L . Huntsberger, C. Rangaraj an, S.N.Jayaramamurthy, "Representation of uncertainty in computer vision using fuzzy sets", IEEE Transactions on Computers, Vo1.C-35, No.2, pp.145-156, February (1986).
li. P.H.Winston, "Artificial Intelligence", Second Edition, Adison Wesley Publishing Co. (1984).
12. D.L.Goetsch, "Introduction to Computer Aided Drafting", Engle Wood Cliffs, New Jersey, (1983).
1 M E , S.Kakumoto, T.Miyatake, S.Shimada, H.Matsushima, "Automatic recognition of design drawing and maps", International Conference on Pattern Recognition, Montreal, pp.1296-1305, (1984).
14. J.R.Ward, B-Blesser, Pencept Inc., "Interactive recognition
of hand printed character characters for computer input", IEEE CG&A, pp,24-37, September (1985).
15. M.C.Fulford, "The FASTRAK automatic digitizing system", Pattern Recognition, Vo1.14, No.1-6, pp.65-74, (1981).
16. J. P. Bixler, J.P. Sanford, "A technique for encoding lines and regions in engineering drawings", Pattern Recognition, Vo1.18, No.5, pp.367-377, (1985).
17. T.P.Clernent, "Extraction of line structural data from engineering drawings", Pattern Recognition, Vol.14, No.1-6, pp.43-52, (1981).
18. H-Murase, T. wakahara, "Online hand sketched figure recognition, Pattern Recognition, Vo1.19, N0.2, pp.147-160, (1986).
19, W.C.Lin, J.M.Pun, "Machine recognition and plotting of hand sketched line figures, IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-8, No.1, pp.52-57, January (1978).
20. A.Mitchie, J.K.Aggarwa1, "Image segmentation by conventional and information integrating techniques: A synopsis", Image and Vision Computing, vo1.3, No.2, pp.50-62, May (1985).
22. W-Doyle, "Operations useful for similarity invariant pattern recognition", J.Assoc. Comput. Mac,h.,Vol.9, pp.259-267, (1962).
22. J.M.S.Frewitt, M.L.Mendelsohn, "The analysis of cell images", Ann. N.Y. Acad. Sci. 128, pp.1035-1053, (1966).
23. K. F'ukanaga, "Introduction to Statistical Pattern Recognition", New York Academic Press, pp.260-267, (1972).
212. J.S.Weszka, " A survey of threshold selection techniques", Coinputer Graphics and Image Processing, Vo1.7, pp.259-265, (1978).
25. D.P.Panda, A.Rosenfeld, "Imago segmentation by pixel- classification in (gray level gradient) space", LEEE Transactions on Computers, V~i.27, pp.875-879, (1978).
26. J.S.Weszka, R.S.Nage1, A.Rosenfeld, "A threshold selection technique", IEEE Transactions on Computers, Vo1.23, PP.1322- 1326, ( 1974).
27. S.Watanabe and CYBEST group, "An automated apparatus for cancer prescreening:CYBEST", Computer Graphics, Image Processing, Vo1.3, pp. 350-358, ( 1974).
2 8 . J.S.Weszka, J.A.Verson, A.Rosenfeld, "Threshold selection techniques. 2'" University of Maryland Computer Science Center Tech.Report. 260, (1973).
29. R.N.Wolfe, "A dynamic thresholding technique for quantization of scanned images in automatic pattern recognition", National Security Industrial Association, Wa.shington D.C., pp.143-102, May (1979).
30. H.Ogawa, K.Taniguchi, "Thinning and stroke segmentati-on for hand written Chinese character recognition", Pattern Recognition, Vo1.15, No.4, pp.298-308, (1982).
31. T.W.Ridier, S.Calvard, "Picture thresholding using iterative selection method, "IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-8, No.8, pp.629-632, August (1978).
32. C.K.Chow, T.Kaneko, "Boundary detection of radiographic images by a threshold method", Proc. IFIP Congress 71, Booklet TA-7, pp.130-134, North - Holland, Amsterdam, (1972).
33. N-Ahuja, A.Rosenfeld, R.M.Haralick, "Neighborhood gray level as feature in pixel classification", Pattern Recognition, V01.12,251-260, (1980).
34. J. Kittler, J. Foglein, "Contextual classification of multispectral pixel data", Image and Vision Computing, Vo1.2, No. 1, pp. 13-29, February (1984).
35. J. Hyde, J.A. Fulwood, B. R. Corsll, "An approach to knowledge driven segmentation", Image and Vision computing, Vo1.3, No.4, pp.198-205, November (1985).
36. P.Zamperoni, "Model based segmentation of graytone images", Image and Vision Computing, Vol.2, No.3, pp.123-133, August, (1984).
37. C. J.Milditch, "Linear skeletons from square cupboards", Machine Intelligence, Vo1.4, pp.403-420, (1969).
38. N.J.Naccashe, R.Shingha1, "SPTA: A proposed algorithm for thinning binary patterns", IEEE Transactions on Systems, Man and cybernetics, Vol . SMC-14, No. 3, pp. 409-418, May/June (1954).
39. F.W.M.Stentiford, R.G.Mortimer, "Some heuristics for thinning hand printed binary characters for OCR", IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-13, No.1, pp.81-83, January/February, (1983).
40. C-Arcelli, G.S.Di Baja, "A thinning algorithm based on prominence detection", Pattern Recognition, Vo1.13, No.3, pp.225-235, (1981).
41. K.J.Udupa, I.S.N.Murthy, "Some new concepts for encoding line patterns", Pattern Recognition, Vo1.7, pp.225-233, (1975).
42. T-Wakayama, "A core line tracing algorithm by maximal square moving", IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol.PAM1-4, No.l,pp.68-74, (1982).
- 43. H.Freeman, L.S.Davis, "A corner finding algorithm for chain coded curves", IEEE Transactions on Computers, --- pp. 297- 303, March ( 1977 ) .
44. A.Rosenfeld,J.S.Weszka,"An improved method of angle detection in digital curves", IEEE Transactions on computers, pp.940- 941, September (19750.
45. L.S.Davis, "Understanding shapes and angles", IEEE Transactions on computers, V0L.C-26, No.3, pp.236-242, March ( 1977 ) .
46. A.Rosenfeld, E.Johnston, "Angle detection on digital curves", IEEE Transactions on computers, pp.875-878, September (1973).
47. T.Pavlidis, " A hybrid vectorization algorithm", International Conference on Pattern Recognition, Montreal, pp.490-492, (1.984).
48. C.Y.suen, M.Rerthod, S-Mori, "Automatic recognition of hand printed characters - The state of art", Proceedings of IEEE, Vo1.68, No.4, pp.469-487, April (1980).
49. S.Mori, K-Yamamoto, M.Yasuda, "Research in machine recognition of hand printed characters", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.PAMI-6, No.4, pp.386-405, July (1984).
50. M.A.Fischler, R.C.Bolles, "Perceptual organization and curve partitioning", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.PAM1-8, No.1, pp.100-105, January (1986 ) .
51. H.Bunke, G.Sagerer, "Use and representation of knowledge in image understanding, based on semantic networks", International Conference on Pattern Recognition, Montreal, pp.1135-1137, (1984).
52. M.Numao, M.Shizuka, " A frame like knowledge representation - system for computer vision", International Conference on Pattern Recognition, Montreal, pp.1128-1130, (1984).
53. Zvi Kohavi, "Switching and Automata Theory", Second Edition, McGraw Hill Inc., New York, pp.24-36, (1978).
54. Shriram Revankar, B.Yegnanarayana, M.Manohar, "Binarization of line images using edge vectors", IEEE Transactions on System Man and Cybernetics, Communicated (1987).
55. Shriram Revankar, B.Yegnanarayana, "Geometric reconstruction of hand-drawn line sketches", IEEE Transactions on Pattern Analysis and Machine Intelligence, Communicated (1987).