geometrical processing- texture segmentation- OCR
-
Upload
neha-rathore -
Category
Documents
-
view
231 -
download
1
description
Transcript of geometrical processing- texture segmentation- OCR
NEHA RATHORE |ID- 5994499980 1
EE 569- Fall’08
Submitted by-
NEHA RATHORE
5994499980
EE569-Project-3 Geometric modification, Texture Analysis & ;
Optical Character Recognition(OCR)
EE 569- Fall’08 | ID- 5994499980 2
GEOMETRICAL MODIFICATION
Problem 1- Geometrical modification
Objective
We were given four images Boat1-Boat4 all in different orientations and scales. All together they
represent a single image boat.raw. We had to implement an algorithm to properly scale, translate and
rotate these images so as to join these images properly to make final boat.raw.
Motivation
In this problem, the objective is to perform Geometrical Modification on an image. This requires
manipulation to be done on the image co-ordinates and not on the image intensity values. Geometrical
transformations modify the spatial relationship between the pixels in the image. These transformations
are often called rubber-sheet transformation.
Geometric Transform of an image refers to the family of linear operations on an image such as the
spatial translation, spatial rotation, spatial scaling and perspective transformation. These operations are
an integral part of Computer Graphics and Animation which involves a non-linear combination of the
above basic operations. It should be noted that all the above operators when employed in series are not
commutative, which is a basic fact that arises from the property of matrices. Geometric Image
Modification plays an important role in Image registration and Image synthesis.1
In terms of Digital Image Processing, geometrical transformation consists of two basic operations:
• A spatial transformation of coordinates
• Intensity interpolation that assigns intensity values to the spatially transformed pixels.
Image registration is an important application in DIP to align two or more images of the same scene.
The main learning in this assignment was to have knowledge of converting image coordinates to
Cartesian coordinates and processing the image in Cartesian coordinate system. Also I learnt how to
scale , rotate, translate images to get the desired output. Finally I learnt image registration in joining the
4 images together. Along with this I learnt zoom-in and zoom-out concepts, shearing concepts.
PROCEDURES
I modularized this part of assignment in different challenges to be achieved:
• The first challenge in this report was to find the corners of the rotated boat image, that was
located inside a bigger white box.
• The second challenge was to find the center of the image and translate it to (0,0) location.
• The third challenge was to rotate this image about the point(0,0) to the desired angle so as to
make the image straight along horizontal and vertical axis.
1 Digital Image Processing By Rafael C. Gonzalez, Richard Eugene Woods
EE 569- Fall’08 | ID- 5994499980 3
• The fourth challenge was to translate the rotated image back.
• The five challenge was to find the scaling factor of image size into size 256 .
• Fifth challenge was join the four scaled image together.
Algorithm
We have four different images, all the images have different orientations and are scaled by different
factors.
The main task is to design an algorithm to translate, rotate and scale these images such that, they can be
combined to form the desired 512x512 image.
Image coordinate to Cartesian coordinate conversion
As mentioned above the points are first represented in Cartesian coordinate system which has the
center point in the bottom left corner which is transformed into image coordinate system whose center
point is top left corner of the image2 . The relationship between the Cartesian coordinate
representations and the discrete image arrays of the input and the output images are shown by
Xk = k - ½
Yk = J + ½ -j
These equations represent the output array indices to their Cartesian coordinate system. Similarly the
input array relationship is given by
Uq = q -½
Vp = P+½ - p
We find that the basic goal of implementing Geometrical Modification is to find out where a particular
pixel in the input image maps to in the output image. These transformations often yields no integer
values which makes it tough to determine the correct position in the output. However ,the reverse
operation of finding out where a particular pixel in the output image comes from in the input image is
much better . Thus in order to implement this reverse mapping ,we need a function. This is called as
reverse address mapping function. Thus the entire problem of Geometrical Modification is resovled if
we find the “Reverse Address Mapping Function”.
Thus for the each of the different geometrical operations like translation scaling and rotation we have
different reverse address mapping functions which is basically obtained by multiplying the inverse of
each of the transform matrix to the corresponding input coordinates to obtain the final image
coordinates .
Corner Detection
There are different algorithms for detecting the corners like Harris Corner detector etc. that detects the
corner by using the change in intensity levels in nieghboring pixels. However, since our images were very
2 Pratt W. Digital image processing 3ed
EE 569- Fall’08 | ID- 5994499980 4
simple with only 4 corners and are located entirely within the white space, it becomes easy for us to
calculate the corners by merely scanning for the first non-white pixel in different directions.
We scan the 256x256 image in the following order:
//first quadrant
for(a=0;a<256;a++)
{ count=0;
for(b=0;b<256;b++)
{
if(Input[a][b] != 255)
{
//cout<<"top-left corner detected:";
x1=a;
y1=b;
Input[a][b]=0;
cout<<"x1="<<x1<<"y1="<<y1<<endl;
count=1;
goto next;
}
}
}
//second quadrant has the x2 y2 quadrants top
right corner
next:
for(b=255;b>=0;b--)
{ count=0;
for(a=0;a<=255;a++)
{
if(Input[a][b] !=255)
{
cout<<"top-right corner detected:";
x2=a;
y2=b;
Input[a][b]=0;
cout<<"x2="<<x2<<"y2="<<y2<<endl;
count=1;
goto next1;
}
}
}
//third quadrant has no corner and both other
corners are located in 4rth qouadrant.
//bottom right
next1:
for(a=355;a>0;a--)
{ count=0;
for(b=355;b>0;b--)
{
if(Input[a][b] != 255)
{
cout<<"bottom-right corner detected:";
x3=a;
y3=b;
Input[a][b]=0;
cout<<"x3="<<x3<<"y3="<<y3<<endl;
count=1;
goto next2;
}
}
}
//bottom left
next2:
cout<<"in 4"<<endl;
for(b=0;b<356;b++)
{ count=0;
for(a=355;a>=0;a--)
{
if(Input[a][b] != 255)
{
cout<<"bottom-left corner detected:";
x4=a;
y4=b;
Input[a][b]=0;
cout<<"x4="<<x4<<"y4="<<y4<<endl;
count=1;
goto next3;
}
}
}
EE 569- Fall’08 | ID- 5994499980 5
OUTPUT:
Since I replaced every corner detected by a black pixel( for check purposes) I found out that the corners
were successfully detected for all four images.
To Find the angles of rotaion we just for a right angles triangle using two corners of the image and use
Angle=sin-1
(opposite/hypotenuse)
We use the following relations3;
3 Digital Image Processing By Rafael C. Gonzalez, Richard Eugene Woods
EE 569- Fall’08 | ID- 5994499980 6
Intensity interpolation
GRAY LEVEL or INTENSITY INTERPOLATION:
Once we have calculated the relation of coordinates between input and output coordinates, the next
most important operation is intensity interpolation. The address mapping done in the previous step
might give us non integer values. Because the distorted image is digital, its pixel values are defined only
at integer coordinates. Thus using non integer values causes a mapping into locations of the image
points for which no gray levels are defined. Thus coming to the conclusion on what the gray level values
at those locations should be , based only on the pixel values at the integer coordinate location , then
becomes necessary. Thus gray level interpolation is used to get this transformation.
In our discussion we use bi-linear interpolation. Bi-linear interpolation technique uses the gray levels of
the four nearest neighbors to interpolate the value of the new non integer pixel. The gray level of each
of the four integral nearest neighbors of a no integral pair of coordinates is known, the gray level value
at these coordinates can be interpolated from the values of its neighbors by using the relation ship
. (p,q) .(p,q+1)
*(i,j)
.(p+1,q) .(p+1,q+1)
F(i,j) = (1-a)[(1-b)F(p,q) + b.F(p,q+1)] + a[(1-b)F(p+1,q) + b.F(p+1,q+1)] Where a and b are the corresponding distance of the intermediate value f(i,j) with respect to its
neighboring pixel coordinates along horizontal and vertical directions.
Interpolation is basically averaging between neighboring pixels.
Let's say you have 3x3 image.
10 4 8 2 12 6 8 4 2
You might want to make 6x6 image with this image by bilinear interpolation.
(O: No value assigned yet)
10 O 4 O 8 O O O O O O O 2 O 12 O 6 O O O O O O O
EE 569- Fall’08 | ID- 5994499980 7
8 O 4 O 2 O O O O O O O
First, obtain value of unassigned pixels by averaging horizontally neighboring two pixels.
10 7 4 6 8 8 O O O O O O 2 7 12 9 6 6 O O O O O O 8 6 4 3 2 2 O O O O O O
Second, obtain value of unassigned pixels by averaging vertically neighboring two pixels.
10 7 4 6 8 8 6 O 8 O 7 O 2 7 12 9 6 6 5 O 8 O 4 O 8 6 4 3 2 2 8 O 4 O 2 O
Lastly, obtain value of unassigned pixels by averaging neighboring 4 pixels.
(In case of edge pixels, obtain value of unassigned pixels by averaging neighboring 3 pixels
. 10 7 4 6 8 8 6 7 8 7.5 7 7 2 7 12 9 6 6 5 6.5 8 6 4 4 8 6 4 3 2 2 8 6 4 3 2 2
EE 569- Fall’08 | ID- 5994499980 8
RESULT and DISCUSSION
Example of rotation, translation and scalling for one part of boat.raw
Final output
EE 569- Fall’08 | ID- 5994499980 9
DISCUSSION
We were able to rotate the image and scale it accordingly. But we notice that the image has blurred
to some extent. This is because of approximation used in intensity interpolation.As we are distributing
same intensity to set of surrounding pixels, it has en effect of averaging filter which also makes the
image blur. Also we were not able to stitch the image together properly. This is because of rounding
errors in the decision of boundaries. This could be avoided by copying the pixels a little bit outside the
boundaries so as to fill the white space.
1B- SPATIAL WRAPPING
MOTIVATION AND OBJECTIVE:
Spatial warping is a useful technique of determining the coordinate relationship between the input and
the output images so as to get a linear system given the control points or the degrees of freedom. Once
the linear system has been obtained, all the points in the input image will obey or follow this system and
get warped to the corresponding coordinates in the output image. This system of equations or
coefficients can be used to recover an image that has been distorted or warped in a particular manner.
Geometric (or spatial) transformations on an image are typically used to correct for imaging system
distortion or conversely to purposely distort (i.e., warp) for purposes of achieving some desired visual
effect. Geometric correction is an important image processing task in many application areas. Distortion
may arise from aberrations in the sensor.. A geometric transformation is given by a mapping function
that relates the points in the input image to corresponding points in the output image. The mapping
may be represented by a pair of equations or a transformation matrix. The matrix is either known a
priori, or, as is true for a vast majority of applications, must be inferred from a set of points of
correspondence, typically called control points. Once the transformation matrix is known, it may be
used to compute a corrected output image from a known distorted input image. For example, they can
be employed to recover an image that has been distorted by a physical imaging system. Typical
examples include barrel and pincushion distortion. In remote sensing and satellite imagery, the common
distortions are due to earth curvature and various attitude and altitude effects .
Non-linear geometrical modification has also a wide range of applications apart
from its usage in multimedia and graphical illusions.
Ex: The pictures taken during aerial surveys or the photographs taken by satellites have considerable
distortions that are non-linear in nature.
Procedure
In general, a spatial transformation is defined by a polynomial function of the form
where x,y and u, v are point coordinates in the input and output images, respectively, N is the
polynomial order, and a, b are mapping coefficients that characterize the transformation.
EE 569- Fall’08 | ID- 5994499980 10
We are required to an affine transformation resulting in a image whose coordinates have a equation of
degree 2. This implies the transformation is not linear.
We choose the values of x and y such that maximum degree is 2.
U= ( a0 a1 a2 a3 a4 a5)(1 x y x^2 xy y^2)t
V= ( b0 b1 b2 b3 b4 b5)(1 x y x^2 xy y^2)t
Breaking down this into steps
• Finding the control points
• Calculating the A matrix
• Calculating inverse of A
• Finding coefficients a0-a5 and b0-b5 by multiplying Ainv*U and Ainv*B
• Applying these coefficients to the general affine transformation equation to calculate the new
coordinates for input coordinates.
Algorithm
Finding the control points
We have the sample input and sample output images for this problem.
We are also given the radius of the circle in the output image. This makes it easy to calculate the values
of u,v in output image corresponding to the x ,y in the input image. We manually see these points and
find the mapping chart.
This chart is useful in finding the coefficients as these are one of the roots of the above equation.
We then for the A matrix which is given as follows:
A= [1 X Y X^2 XY Y^2]
Then number of rows in this matrix depends on the number of control points.
I deally for six unknown coeff, sic control points should be enough. But to make the result better we can
choose more then 6 control points.
We then calculate the inverse.
I used matlab to find the inverse of this 6x6 matrix.
Reason:I was using an online matrix inverse calculator to find the inverse first, but it was giving me
drastically different results. The calculations were not done properly and hence coefficients were
coming out to be incorrect.
EE 569- Fall’08 | ID- 5994499980 11
If the number of control points are more then 6 we have a rectangular matrix. Inverse of this, does not
exist . so ve instead find the pseudo inverse by the following formula;
A inv =(AtA)
-1A
t
We then multiply this matrix with first U matrix to get a coeffs and then with V matrix to get b coeffs.
Finally we get the coeff and we hardcode them into the program.
Results:
CHOOSING THE CONTROL POINTS
The Image was divided in four parts as indicated above:
Then A matrix and coeff for each part was calculated .
This was done because it is very difficult to find a single value of coefficients that can warp the image in
4 different directions.
INPUT--------------------->OUTPUT
X,Y-----U,v
EE 569- Fall’08 | ID- 5994499980 12
A= [1 x y x^2 xy y^2]
PART 2
Part1
Control points
0,0 -�256,0
128,128--256,128
256,256--256,256
128,384--128,256
0,511----0,256
0,256----181,181
A1 =
1 0 0 0 0 0
1 128 128 16384 16384 16384
1 256 256 65536 65536 65536
1 128 384 16384 49152 147456
1 0 511 0 0 261121
1 0 256 0 0 65536
a=A inverse*u
b=a inverse*v
a0=256.0000;
a1=0.0841;
a2= -0.0841;
a3= 0.0008;
a4= 0.0000;
a5= -0.0008;
b0=0.0;
b1=0.0861;
b2=0.9139;
b3=0.0008;
b4=-0.0000;
b5=-0.0008;
PART2
Control points
256,256----256,256
256,511---181,331
511,511---256,511
128,384--128,256
384,384---256,384
0,511-----0,256
A2 =
1 256 256 65536 65536 65536
1 256 511 65536 130816 261121
1 511 511 261121 261121 261121
1 128 384 16384 49152 147456
1 384 384 147456 147456 147456
acoef =
a0=256.0000;
a1=0.9132;
a2=-0.9132;
a3=-0.0008;
a4=0.0000;
a5=0.0008;
bcoef =
b0=0;
b1=0.0868;
b2= 0.9132;
b3= 0.0008;
b4=-0.0000;
b5=-0.0008;
Part 3
Control Points
256,256---256,256
384,128---384,256
511,511---256,511
Part 4
Control Points
256,256----256,256
0,0--------256,0
128,128----256,128
EE 569- Fall’08 | ID- 5994499980 13
511,256---331,331
511,0-----511,256
384,384---256,384
A3 = 1 256 256 65536 65536 65536
1 384 128 147456 49152 16384
1 511 511 261121 261121 261121
1 511 256 261121 130816 65536
1 511 0 261121 0 0
1 384 384 147456 147456 147456
acoef3 =
256.0000
0.9152
-0.9152
-0.0008
0.0000
0.0008
bcoef3 =
0
0.9132
0.0868
-0.0008
-0.0000
0.0008
256,0------331,181
384,128----384,256
511,0------511,256
A4 =
1 256 256 65536 65536 65536
1 0 0 0 0 0
1 128 128 16384 16384 16384
1 256 0 65536 0 0
1 384 128 147456 49152 16384
1 511 0 261121 0 0
ac4 =
256.0000
0.0861
-0.0861
0.0008
0.0000
-0.0008
bc4 =
0
0.9139
0.0861
-0.0008
-0.0000
0.0008
EE 569- Fall’08 | ID- 5994499980 14
RESULTS & DISCUSSION
Input image
SCANNING REGIONS
EE 569- Fall’08 | ID- 5994499980 15
The warped image of PART4
EE 569- Fall’08 | ID- 5994499980 16
Final OUTPUT
IMAGE
DISCUSSION
If we notice properly the image is shifted by 1 pixel from top and bottom. Also the curve is not exactly
smooth and we can see some scales-like effect in some regions. The possible reason for this is that, since
the warping used here is not linear values of U,V might be in fractions for some values of X,Y. Rounding
this value of U,V places the pixel in the nearest pixel possible. This produces Scale-like effect
Also some the angle of warping is discreet it is not giving the exact point of end points and makes the
image shift 1 pixel down from above and below.
EE 569- Fall’08 | ID- 5994499980 17
This was an efficient but a tedious way of finding the warped image. This warped image can also be
produces by use of polynomial equation of degree 3.
Even a slightest change in coeff. value is drastically reflected on the output image. This process is variant
to rounding errors and decimal approximations. As this process produces a smaller image as compared
to the original image, we didn’t see the lack of intensity in any region.
We can see from final image that the input image is spatially warped successfully.
PROBLEM 2- TEXTURE ANALYSIS
Part A- TEXTURE CLASSIFICATION
OBJECTIVE:
We are given 12 sample of textures . there are four groups containing three textures each. We have to
design an algorithm to classify the clusters of these images belonging to different groups.
MOTIVATION :
In this problem, either a group of images are to be classified according to their texture types or a single
image is to be segmented into different parts with each part having a distinct texture type. Texture
analysis plays an important role in the interpretation of remote sensing images, satellite maps, etc. A
texture is related to the visual appearance of the region. It is due to semi regular patterns which are not
strictly periodic. Texture analysis is carried out basically to describe structured patterns. Edge detection
cannot be used here because if the texture is very fine, the edge density will be very high and hence the
output segmented image will not be appealing. Textures can be structured patterns of object surfaces
such as wood, grain, sand, grass, cloth etc. They are very difficult to define correctly. Each texture is
characterized by a set of characteristics called as “features”. A feature is summarized information which
catches the essence of a texture type but still has the desired discriminated power. If two images have
the same texture type, their features should be identical.4
The texture classification is used to identity features of an image and find out information about that
image. For example, for a picture taken from moon , earth appears in different colors. Through texture
classification, it is possible to identify the regions of water, land,forest and etc. on earth. Another
motivation to do problem is that texture classification can be used to generate applications like voice
automation on basis of texture classification. Image a blind walking through a park and is about to strike
to a tree, if the texture classification is done properly in real time, then this collision can be avoided.
Procedure
4 Pratt W. Digital image processing 3ed;
EE 569- Fall’08 | ID- 5994499980 18
Image Segmentation based on Texture is very challenging and various methods have been proposed for
achieving this goal. However only a few of them have been successful. One such technique is the “Laws
Filters” which has been used for my implementation. The FIFTEEN input images are read into a cell array
such that each image is stored as an element in the array. Each of the fifteen images are passed through
a filter bank that consists of nine filters. The three basic filters that give rise to these nine filters are:
Local Average Filter L3 = 1/6 * [1 2 1]
Edge Detector E3 = ½ * [-1 0 1]
Spot Detector S3 = ½ * [1 -2 1]
The idea was to form the tensor product of each of these 3 filters and get 9 “ 3 x 3” filters. The basic
understanding behind the usage of these 3 filters is as follows: Upon doing the Fourier Analysis of each
of these filters ,it is observed that ,
L3: acts as a Low Pass Filter(L.P.F)
E3: acts as a Band Pass Filter(B.P.F)
S3: acts as a High Pass Filter(H.P.F)
CALCULATION OF THE ENERGY VECTORS: Therefore when all 3 filters are put together, we cover the low frequency ,high frequency and middle
frequency regions. Once the nine filters have been obtained, each of the 15 images are passed through
the filter bank to produce a set of nine output images per input image.
Gi = input image * lawfilter(i) [ i varies from 1 – 9]
For each of the Gi’s produced in the previous step ,the energy is computed as follows:
Energy Fk= (1/N^2) {all i ∑ all j ∑ |Gk(i,j)|^2 }
where k : 0 �9
Each input image when passed through the filter bank will give rise to a set of nine energy components
related to the nine output images produced by the filter bank. These nine components can be treated as
a 9 point Energy vector in the 9 dimensional Feature Space.
Since there are 15 images to be classified, there are a total of 15 such energy vectors in the 9D Feature
Space. Each 9D Energy vector is a point in the Feature Space. Thus the resulting feature space is an array
of 15 x 9 vectors, since each image has 9 different feature vectors for 9 different law filtered image.
EE 569- Fall’08 | ID- 5994499980 19
EUCLIDEAN DISTANCE CLASSIFICATION: The classification can be done based on the proximity of the energy vectors in the feature space. The
nearness of a vector to another vector can be determined by calculating the Euclidean distance as
follows:
Euclidean Distance = || Ei – Ej ||
= ∑ (( Ei – Ej) ^ 2)
The Euclidean Distance is calculated from every energy vector to every other energy vector in the
feature space.
NEAREST NEIGHBOR ALGORITHM: In my classification problem I have used nearest neighboring algorithm. An energy vector in the feature
space whose Euclidean distance from another energy vector is below a threshold is said to be a nearest
neighbour of the same. The essence of the approach is to determine all such vectors that lie in close
proximity to each other.
Since it is known that there are 4 classes of images among a group of 12 images, the Euclidean distance
of one vector from the every vector thus calculated is arranged in a row and the least three distances
are determined and the corresponding energy vectors are said to be in close proximity to the same in
the 9D feature space
RESULT for classification
EE 569- Fall’08 | ID- 5994499980 20
Energy Computation
T1 T2 T3 T4 T5 T6 T7 T8 T9
Image1 0.0420 0.0031 0.0050 0.0037 0.0030 0.0078 0.0079 0.0074 0.0203
Image2 0.0616 0.0037 0.0036 0.0040 0.0040 0.0063 0.0042 0.0070 0.0158
Image3 0.0446 0.0040 0.0072 0.0071 0.0074 0.0183 0.0101 0.0204 0.0524
Image4 0.0350 0.0048 0.0083 0.0041 0.0080 0.0196 0.0060 0.0170 0.0529
Image5 0.0616 0.0040 0.0040 0.0041 0.0041 0.0064 0.0045 0.0071 0.0157
Image6 0.0402 0.0023 0.0037 0.0025 0.0022 0.0055 0.0041 0.0057 0.0160
Image7 0.0327 0.0065 0.0111 0.0045 0.0101 0.0254 0.0072 0.0212 0.0627
Image8 0.0489 0.0043 0.0068 0.0049 0.0067 0.0169 0.0077 0.0180 0.0509
Image9 0.0604 0.0038 0.0039 0.0044 0.0047 0.0072 0.0048 0.0081 0.0178
Image10 0.0313 0.0070 0.0111 0.0052 0.0113 0.0263 0.0081 0.0217 0.0723
Image11 0.0450 0.0041 0.0062 0.0059 0.0067 0.0158 0.0082 0.0164 0.0470
Image12 0.0448 0.0023 0.0039 0.0027 0.0022 0.0055 0.0062 0.0055 0.0159
GRAPHICAL CALCULATIONS
DISCUSSION
We see from the graphs, that the energy values of all the 12 images lie within a certain range. That is, for
a certain kind of texture, the energy valuesof different images fall in close proximity to each other. As
seen from the graph, the 9 feature vectores obtained from each set of image lie in close proximity to
each the images having similar textures. For example, for images 1, 12 , 6, we see that all the 9 vectors
EE 569- Fall’08 | ID- 5994499980 21
have more or less same values. This decreases the euclidean distance between the two images and
hence, classify then as same textures.
we also notice, that our result shows an error in line 4 where it recognized 4 in the group of image 3 and
11. But It was able to detect the proper group later on. I tried to find the logical error in this but could
think of any.so I concluded,this might occur due to some transitional stage in the algorithm.
I decided to take the best of 3 as a result.
Part 2B- TEXTURE SEGMENTATION
Objective:
Now we are given a cluster of different textures within an image. We have to do segmentation such that
we are able draw a boundary between different textures within this image.
In our case, we are given 2 images that have four clusters each. We have to use the method used above
for classification, but modify it in such a way that it applies to the pixels rather than image.
MOTIVATION:
Instead of considering the entire image, we can perform the analysis of the features associated with
every pixel in the image. It is intuitive to note that the visual vector of each pixel is not stable but that of
the entire image is stable.
We also know that pixels belonging to same type of texture have their corresponding feature vectors
close to each other in the feature space. So an analysis of a pixel instead of the entire image would
provide more information about the texture information.
This kind of image processing is used in applications where we have to detect the features or object of a
picture like face detection, where, eyes, hair, nose, lips etc are separated from each other for different
analysis.
One of the exciting applications that I think of is, extracting the features of a particular city when looked
from a large height.
Procedure:
The input image is scanned for each pixel and a 3x3 matrix is formed by picking but all 4-connected
and 8 connected pixels. We then apply the laws filter to each pixel like this and generate 9 images for
9 laws filter.
We then take a large region of the image surrounding a particular pixel and calculate the energy.
We use the kmeans algorithm to find the areas of different clusters.
ALGORITHM AND IMPLEMENTATION:
EE 569- Fall’08 | ID- 5994499980 22
The input image is applied to the 9 (3 x 3 ) filters given by Law to produce nine images namely T1 , We
know that pixels belonging to same type of texture have their corresponding feature vectors close to
each other.T2,T3 ,…….T9. The main assumption that leads us to this analysis of performing operations on
a pixel by pixel basis is that it would be more efficient than on the entire image, the energy
corresponding to each pixel in the set of images ‘G’ is determined. Since there are nine such images,
each pixel in a particular location in the input image will have a 9 dimensional Energy vector as opposed
to part above where each image had a 9 dimensional energy vector in the Feature Space.
Energy Fk= (1/W2) { all I € W € all j ∑ W ∑ |Gk(i,j)|^2 }
where k : 0 ���� 9,W:size of the window(15 x 15)
Thus each of the Gi’s is subject to energy computation for each of its pixels. This results in each pixel of
the input image having nine different features namely,
{ f1,f2,f3,f4,f5,f6,f7,f8,f9}.
This kind of averaging is done to reduce the statistical fluctuations in image. This prevents the spreading
of the clusters in the feature space which could make the segmentation process even more complicated.
The size of the window, if large, will cause the clusters to merge in the feature space that could result in
the merging of textures in the output image. On the other hand, a smaller window size could result in
the over spreading of the clusters that could result in over segmentation. Hence the size of the window
should be chosen on an experimental basis so that the above two cases are avoided to the best possible
extent
� I decided to go with a window size of 51x51. I scan the image after extending the size of image
by 25 pixels from each side. I copy the first 25 pixels from each side and copy it to adjacent
areas. This I am doing to fill the new pixels with a value same as the image.
� This is done so as to avoid the error due to averaging. For example; average of 5+5+5/3 is 5 but
0+0+5/3 is 5/5 which is smaller then the previous case. So if we leave the extended bits zero ,
we will get faulty energy values
� I then apply the k means algorithm.
K-MEANS ALGORITHM:
Initially a set of centroids are choosen.
Suppose there are N2 points in the feature space. If they are to be classified into k classes, the first step
is to assume k centroids.
The distance of each and every point from the centroids is calculated.
EE 569- Fall’08 | ID- 5994499980 23
The distances of each and every point from the centroids are calculated by using the Nearest Neighbor
Method or the Euclidean Distance method discussed above.
Each of the points are associated with the closest centroid thus forming various clusters.
The average of the centroids is calculated to get a new centroid. The above two steps are repeated or
the clustering is performed again.
Converging the iteration.
The centroid for each and every cluster thereby formed is calculated and the above steps are repeated
until the centroids of the previous iteration and the current iteration is less than the threshold passed in
the function. The threshold can be as minimal as possible.
RESULTS & DISCUSSION
15x15 window in texture1
15x15 window in texture2
EE 569- Fall’08 | ID- 5994499980 24
we see that the result of 15x15 window was very poor for texture1 but is fairly better for
texture2.However, these is lot of scope left for improvement here.
The reason for this is that the textures in first image are very similar to each other. So it is highly
possible that without any information of their spatial distribution, the total energies calculated for
each pixel are so close that it becomes hard to distinguish. Thus, giving an error in detecting proper
segments.
In second image the textures have a large variation from each other, thus producing energies differing
by an intelligible amount, thus it is easier to detect these patterns correctly.
DISCUSSION1-CHOOSING A DIMENSION FOR WINDOW:
We started with a 15x15 window and calculated energies accordingly. However, in the results you will
see that choosing this size of the window was not sufficient as it is not calculating the segmentation in a
proper way. The reason for this is simple. Say, we are considering a brick pattern. A 15x15 window will
consider a small region of the whole brick and hence produce an energy value that is similar to the
selected region in some other part of the image. This produces error in segmentation. I decided to work
with a 51x51 window.
DISCUSSION2-Adding extra dimension to feature vector
.
Extending the size of window improved the result. However, there is still some scope of improvement
left in the result. I decided to add the coordinates x y of the pixel as an extra 2 dimensions in the feature
space. This helps in binding the pixels by some kind of spatial distribution. Basically we are taking into
consideration, where a pixel is located inside an image.
51X51 window in texture1 with extra 2 dimensions
51X51 window in texture2 with extra 2 dimensions
EE 569- Fall’08 | ID- 5994499980 25
Increasing the window size gives better result. However using a larger window produces an error in
decision of boundaries properly. This is because a 51x51 window we are also considering for the
boundaries , this makes energy of boundaries not distinct and hence producing error in decision of
boundaries.
PROBLEM3-
OPTICAL CHARACTER RECOGNITION (OCR)
OPTICAL CHARACTER RECOGNITION (OCR)
OBJECTIVE:
We have been give a training set that contains alphabets A,B,C,D,E,K,L,M,N,O,P,R,S,T,U. All of them
are of Arial font and same font size. We have to use this as our training set to make our program
understand the characteristics of different features of alphabets. In first part we have to extract the
shape features of the alphabets in terms of line numbers and end points. In second part we have to
read the test image and compare its result with our feature set to find out what kind of character it is.
We have to develop a set of features based on a set of training data and then scan the test images and
declare each of the characters in these test images as the one which is the closest match in the
training data.
MOTIVATION
EE 569- Fall’08 | ID- 5994499980 26
The idea behind Optical Character Recognition is to extract features from the characters and/or
numerals and special symbols and use them as parameters to segment and detect their presence in
any document. This is the principle of Document Processing. This plays an important part in Pattern
Recognition and also for describing objects in an Image Understanding System.5
OCR is also used for shape analysis of images where in a particular symbol is declared to be of a
predetermined character.
PROCEDURE & Algorithm:
Steps part1:
� Binarize the image.( to distinguish between the object and background)
� Thin the image (used for part1 only) (we need to find end points and diagonals by finding the
hit and miss patterns of given set of masks.
� Find the minimum bounding box for each character and segment the characters
� Run the algorithm to find end points and line numbers for each character.
� Store it in an array representing feature vector.
� Compare the characters on the basis of this feature vector.
Steps part2:
� Binarize the image.
� Find the minimum bounding box for each character and segment the character in different
arrays.
� Rum algorithm to find: Area, perimeter, Euler number, circularity, spatial moment, symmetry,
aspect ratio, Euclidean distances (my approach).
The following were the imperative concepts and steps:
BINARIZATION:
Binarize the training image into two gray levels (0 and 255). If the pixels’ value is less than a particular
threshold, then set its gray level to 0, else set its gray level to 255. Our given image has values ranging
from 255 or 0 so binarizing is an important step.
I used the threshold value of 128 and gave value 0 to every pixel below 128 and value 255 to every
pixel above 128.
OBJECT SEGMENTATION:
In my program, I have taken advantage of the fact that the characters are uniformly distributed. I am
checking all white rows and all white columns to segment image in 15 segments roughly. Although
this is not the best approach to do this, but this was a suggestion of the TA and I found it convincing to
not complicate matters. I have first determined 15 boxes containing each character roughly .I mean
the character might not be in the center but is contained inside the image.
BOUNDING BOX DETERMINATION:
5 Digital Image Processing By Rafael C. Gonzalez, Richard Eugene Woods
EE 569- Fall’08 | ID- 5994499980 27
We detect the corners (as done in the previous case) and store these values of x and y coordinates. I
found the ymin , ymax, xmin, xmax from this set of values and draw a box (virtually).this box contains the
image completely with no extra rows and columns. I use this box to find different features.
FEATURE EXTRACTION:
I have used the following features in my program in order to characterize the numbers/characters
given in the training image
Part 1-
� Line number
� End point number
Part 2-
� Area
� Perimeter
� Euler Number
� Circularity
� Aspect Ratio
� Symmetry (Upper mass, lower mass,rightmass and leftmass)
� Central spatial moment
� Symmetry
� Elongation
� Euclidean distance from feature vector (my approach).
Part 1-
Finding the line direction
I check the image (segmented wherever mentioned from now on) for 4 patterns indicating occurrence
of a line. There are four different line directions that can occur in a character. These line directions are
as follows:
I have declared hline, vline, ldline, rdline as the integers that will store the number of occurrence of
these patterns in the given image.( h=horizontal,v= vertical, ld= left diagonal, rd=right diagonal)
As soon as we get a hit, we record that particular instance as a hit and assign a value 1 to that
respective integer.
I have declared an array linenumber={hline,vline,ldline,rdline}; so for instance for A={1,0,1,1}. This is
one part of my feature vector.
EE 569- Fall’08 | ID- 5994499980 28
End point number
End point is defined as a point that is only connected with one direction. (4 connectivity or eight
connectivity. End point is located at the end of the point. End point number can be calculated by using
the masks below:
I have declared and array:
numberofendpoints1[15][8]={leftendpoint,rightendpoint,topendpoint,bottomendpoint,topleftdiagon
alendpoint,toprightdiagonalendpoint,bottomleftdiagonalendpoint,bottomrightdiagonalendpoint};
This stores the occurrence of these points in a character. This way we have a clear picture of spatial
distribution of end points along with the exact information of end points.
This is my second feature vector.
I concatenate both these feature vectors to a single feature vector. (This helps me find the Euclidean
distance).
PART 2
After segmenting each object in the training data we now compare the objects inside the bounding
box with the following bit quad patterns.
Q1 consists of four masks:
1 0 0 1 0 0 0 0
0 0 0 0 1 0 0 1
Q2 consists of four masks:
1 1 0 1 0 0 1 0
0 0 0 1 1 1 1 0
Q3 consists of four masks:
1 1 0 1 1 0 1 1
0 1 1 1 1 1 1 0
Q4 consists of one masks:
0 0
0 0
Qd consists of two masks:
EE 569- Fall’08 | ID- 5994499980 29
1 0 0 1
0 1 1 0
AREA:
The area of an object is the number of object pixels that constitute the entire object. If an object pixel
has a value equal to 1,
Area = 0.25 * (nq1 + (2 * nq2) + (3 * nq3) + (4 * nq4) + (2 * nqd)
However, the area of the object can be also calculated by simply counting the number of black pixels
in the object. Basically looking for {1} pattern.
Area=n {1}; [I have used this approach]
PERIMETER:
The perimeter of an object is defined as the number of sides of the object which separate the pixels
with different values.
Perimeter = n{Q1} + n {Q2} + n {Q3} + 2*n{Qd}
However , the perimeter of the object can be also calculated by looking at patterns{1 0},{0 1},{1
0}t,{0,1}
t.
Again I have used this approach.
Perimeter =n{1 0}+n{0 1}+n{1 0}t+n{0,1}
t. where n is the number of occurrences.
AREA/PERIMETER RATIO
Ratio of area/perimeter is a better feature then the area or perimeter alone since they are scaling
variant. Sometimes the area is so large that this ratio is guided by area alone and bringing errors in
decision making. So to avoid this we scale the perimeter such the ratio is normalized.
EULER NUMBER:
It is defined as the number of connected components that constitute the object minus the number of
holes within the object.
Euler number = 0.25 * (n{Q1} – n{Q3} – 2*n{Qd})
CIRCULARITY:
The Circularity of an object is defined as the ratio that describes how far the shape of the object
approximates a circle.
Circularity = 4 * pi * Area/ (Perimeter)^2
ASPECT RATIO:
EE 569- Fall’08 | ID- 5994499980 30
The Aspect Ratio is defined as the ratio of the Height of the object to the width of the object.
Height=height of bounding box;
Width=width of bounding box;
Aspect Ratio = Height/Width
Width Ratio = Width/ (Height + Width)
Height Ratio = Height/ (Height + Width)
SYMMETRY:
An object is said to be symmetric horizontally if the mirror image of one half of the object along the
horizontal direction gives the other half. An object is said to be vertically symmetric if the mirror
image of one half of the object along the vertical direction gives the other half. We use upper mass
and lower mass to find the horizontal symmetry. An object is said to be entirely symmetric if it
exhibits the above two properties.
I have calculates symmetry in terms of 2 parameters.
Left mass/right mass ratio and uppermass/ lower mass ratio.
To this I divided the bounding box in 2 parts first horizontally and then vertically. Then I have
calculated the number of pixels in each side of the axis and taken the ratio. For symmetric objects this
ratio is one.
SPATIAL MOMENT:
The (m, n)th moment of the joint probability density function can be used to describe the features of
an object. Here the joint probability density function is replaced by the continuous image function.
The shape of the object is characterized by a few of the low-order moments. The use of the features
for OCR can be justified by the fact that these features are invariant to a certain extent for a particular
symbol. Fill the data structure corresponding to the features of the training data symbols.
Meanx =∑x f(i,j);
Meany=∑y f (i,j);
Moment (m, n) =∑xm
yn f (i, j);
Value of f (I, j) =1 or 0;
Euclidean Distance:
The Euclidean distance(ed) is defined as:
Euclidean Distance = || Ei – Ej ||
= ∑ (( Ei – Ej) ^ 2)
I analyzed that the feature arrays that I have stored for each character can be considered as feature
vectors with each feature contributing a dimension. If my decision tree fails, I can use the ed to find
the character closest to the given character.
The logic behind this is that each character has a certain kind of spatial distribution that is unique. So
feature vector gives us the way to define a character by some unique set of values.
Some of the features are scaling variant, some are font variant. These kinds of variances can lead us to
have a faulty decision sometimes. So we keep ed as a final check to find the closest character.
EE 569- Fall’08 | ID- 5994499980 31
Implementation and results:
As discussed above, I segmented the image in terms of their spatial distribution6. The set of pixels that
belong to each object are stored as an individual entry in a cell array.
I create 15 arrays for each character.
Step 1- Thinned image of training .raw to calculate the number of lines and end points.
I have designed a feature vector as mentioned above which represents endpoints+line numbers.
For example we get for A:
Endpoints={1,1,0,0,0,0,0,0,} line numbers={1,0,1,1}
So the program stores {1,1,0,0,0,0,0,0,0,0,1,0,1,1} as the Value of A.
To recognize the character in the test image I match the features of the character with all the images.
The vector closest to the feature vector is picked as the alphabet for that text character under
consideration.
Results & Discussion
Example: comparison of features of A=
A training ={1,1,0,0,0,0,0,0,1,0,1,1}
A test ={0,1,0,0,0,0,2,2,1,0,1,1}
6 Confirmed by TA.
EE 569- Fall’08 | ID- 5994499980 32
This gives a 1 bit error. So these is a possibility that the procedure can detect a but of we take error
within a limit of plus and minus 2.
-2<error<+2
For hard boundaries like ANDing this procedure failed for almost all alphabets other then, U,T,S,R,P,O.
This is because the characters in training and test both are simple and has no extra extenstions due to
font style. This makes it possible to detect the characters.
I then chose the second approach.
I calculated the above mentioned parameters like area/perimeter ratio, euler number etc.
I then grouped the characters according to their features.
Euler number =0����A,D,O,P,R.
Euler number =-1����B
Similarly.
When I get the imput character,
I first tested it for symmetries. I have save uppermass/lowermass ratios, leftmass/rightmass ratios.
I check these ratios of the input image and see if the image is horizontally symmetric of vertical or
none.
If symmetric- B,D,C,M,O,S,T,U,A,K,E
Else L,P,R.
This narrowed the number of comparison set.
I then check the euler number. And narrow the set further. Euler number is unique(mostly) for all.
EE 569- Fall’08 | ID- 5994499980 33
Later on I check for aspect ratio. This helps me categorize the long characters from wide characters.
Using all these features I was able to successfully detect all the characters.
DISCUSSION & results continued.
The reason for this is that these features are mostly based upon the basic structure of the element
rather than just the font style. Which was the point in first case. Hence, being scaling invariant and
design invariant , we were able to detect the characters as per their basic structure that it should
have.
My approach:
The chart below represents the Euclidean distances of each alphabet in training set to the test set.
The blocks marked in green are the distances for correct detection. Red represents the second closest.
A B C D E K L M N O P R S T U
a
3.8
7
3.6
1
1.7
3
3.3
2 4.58
10.7
7
5.1
0
7.4
2
6.0
0
3.0
0
2.6
5
5.0
0
3.7
4
40.1
2
6.5
6
b
4.4
7
3.4
6
3.7
4
2.8
3 4.00 9.95
4.8
0
5.8
3
4.5
8
3.1
6
1.4
1
3.4
6
3.3
2 2.83
5.6
6
c
4.5
8
5.2
0
2.6
5
4.3
6 5.92
11.1
4
5.6
6
8.4
3
7.0
7
4.5
8
3.3
2
6.4
0
5.2
9 4.58
7.1
4
d
4.4
7
3.7
4
3.4
6
2.4
5 3.74 9.11
3.8
7
6.1
6
5.0
0
2.8
3
1.4
1
4.0
0
3.3
2 2.00
4.6
9
e
4.1
2
4.5
8
4.8
0
4.1
2 5.00 9.17
4.4
7
5.0
0
4.2
4
4.3
6
3.3
2
3.8
7
3.7
4 3.32
6.4
0
k
4.3
6
4.3
6
2.2
4
3.6
1 4.58
10.2
0
4.6
9
7.8
1
6.4
8
3.3
2
3.0
0
5.5
7
4.0
0 3.87
5.7
4
l
3.7
4
4.2
4
2.8
3
3.1
6 4.69 9.64
4.1
2
6.7
8
5.5
7
3.4
6
2.0
0
4.9
0
3.6
1 2.83
5.4
8
m
3.7
4
4.2
4
2.8
3
3.1
6 4.69 9.64
4.1
2
6.7
8
5.5
7
3.4
6
2.0
0
4.9
0
3.6
1 2.83
5.4
8
n
5.2
0
5.0
0
3.8
7
4.1
2 4.12 9.70
4.9
0
7.6
8
6.6
3
3.8
7
3.3
2
5.5
7
4.6
9 3.61
5.7
4
o
4.1
2
4.1
2
2.2
4
3.3
2 4.58
10.1
0
4.6
9
7.6
8
6.4
8
3.0
0
2.6
5
5.3
9
3.7
4 3.61
5.5
7
p
4.0
0
4.6
9
2.0
0
3.7
4 5.48
10.9
1
5.2
0
8.1
2
6.7
1
4.0
0
2.4
5
6.0
0
4.8
0 4.00
6.7
8
r
4.2
4
3.7
4
3.1
6
2.8
3 3.74 9.95
4.5
8
6.3
2
5.0
0
3.1
6
0.0
0
4.0
0
3.6
1 2.45
5.6
6
s
4.2
4
3.4
6
2.8
3
2.8
3 3.46 9.95
4.5
8
6.3
2
4.8
0
2.8
3
1.4
1
4.0
0
3.3
2 2.83
5.6
6
t
3.0
0
3.8
7
3.0
0
2.6
5 3.87 9.27
3.7
4
6.2
5
4.9
0
3.6
1
1.7
3
4.8
0
4.0
0 2.24
5.9
2
u
4.5
8
4.8
0
3.0
0
3.3
2 4.80 9.38
4.0
0
7.6
8
6.4
8
3.6
1
2.6
5
5.7
4
4.2
4 3.00
4.8
0
EE 569- Fall’08 | ID- 5994499980 34
The chart below shows the results of taking error into consideration:
A B C D E K L M N O P R S T U
a
1.7
3
2.6
5
3.0
0
3.3
2
3.6
1
3.7
4
3.8
7
4.5
8
5.0
0
5.1
0
6.0
0
6.5
6 7.42
10.7
7
40.1
2
b
1.4
1
2.8
3
2.8
3
3.1
6
3.3
2
3.4
6
3.4
6
3.7
4
4.0
0
4.4
7
4.5
8
4.8
0 5.66 5.83 9.95
c
2.6
5
3.3
2
4.3
6
4.5
8
4.5
8
4.5
8
5.2
0
5.2
9
5.6
6
5.9
2
6.4
0
7.0
7 7.14 8.43
11.1
4
d
1.4
1
2.0
0
2.4
5
2.8
3
3.3
2
3.4
6
3.7
4
3.7
4
3.8
7
4.0
0
4.4
7
4.6
9 5.00 6.16 9.11
e
3.3
2
3.3
2
3.7
4
3.8
7
4.1
2
4.1
2
4.2
4
4.3
6
4.4
7
4.5
8
4.8
0
5.0
0 5.00 6.40 9.17
k
2.2
4
3.0
0
3.3
2
3.6
1
3.8
7
4.0
0
4.3
6
4.3
6
4.5
8
4.6
9
5.5
7
5.7
4 6.48 7.81
10.2
0
l
2.0
0
2.8
3
2.8
3
3.1
6
3.4
6
3.6
1
3.7
4
4.1
2
4.2
4
4.6
9
4.9
0
5.4
8 5.57 6.78 9.64
m
2.0
0
2.8
3
2.8
3
3.1
6
3.4
6
3.6
1
3.7
4
4.1
2
4.2
4
4.6
9
4.9
0
5.4
8 5.57 6.78 9.64
n
3.3
2
3.6
1
3.8
7
3.8
7
4.1
2
4.1
2
4.6
9
4.9
0
5.0
0
5.2
0
5.5
7
5.7
4 6.63 7.68 9.70
o
2.2
4
2.6
5
3.0
0
3.3
2
3.6
1
3.7
4
4.1
2
4.1
2
4.5
8
4.6
9
5.3
9
5.5
7 6.48 7.68
10.1
0
p
2.0
0
2.4
5
3.7
4
4.0
0
4.0
0
4.0
0
4.6
9
4.8
0
5.2
0
5.4
8
6.0
0
6.7
1 6.78 8.12
10.9
1
r
0.0
0
2.4
5
2.8
3
3.1
6
3.1
6
3.6
1
3.7
4
3.7
4
4.0
0
4.2
4
4.5
8
5.0
0 5.66 6.32 9.95
s
1.4
1
2.8
3
2.8
3
2.8
3
2.8
3
3.3
2
3.4
6
3.4
6
4.0
0
4.2
4
4.5
8
4.8
0 5.66 6.32 9.95
t
1.7
3
2.2
4
2.6
5
3.0
0
3.0
0
3.6
1
3.7
4
3.8
7
3.8
7
4.0
0
4.8
0
4.9
0 5.92 6.25 9.27
u
2.6
5
3.0
0
3.0
0
3.3
2
3.6
1
4.0
0
4.2
4
4.5
8
4.8
0
4.8
0
4.8
0
5.7
4 6.48 7.68 9.38
The chart represents that using this approach we were able to detect B,C,D,O,P,T,S from the test set.
We can detect K,L if we increase our error window. However, we cannot find the correct algorithm to
take a decision on the basis of Euclidean distances only. Our second check should be taking into
consideration each feature separately.
However, we find that this is not a very efficient way for alphabet detection when we take into
consideration only the endpoints and line numbers.
The reason for error-
We see that thinning has a considerable effect in detection of parameters. As shown above, the
thinning effects in training .raw are not as drastic as in test.raw.
EE 569- Fall’08 | ID- 5994499980 35
Unfortunately our algorithm runs only for thinned images and hence is very dependent on font style.
Since in our test image O,P,R,S,T and U are very straight fonts, it is easier to detect them.
Also, I realize there might be some small error in my programming some where which is why it is
showing error in the detection of L and M. but I couldn’t find any possible error in my code.
Part2
Euler Numbers
EULER NUMBERS
Test.raw Training.raw
For A=0;B=1;C=2;D=3;E=4;K=5;L=6;M=7,N=8,O=9,P=10,R=11,S=12,T=13,U=14
Wrong detection of euler number-
FOR ALPHABETS – K,L,M as per our matching set in training.raw. we take this as wrong detection
because we consider results for training as standards.
Detection of B
We just match the euler number of input character by -1. This way we can detect B.
Check for vertical symmetry:
A & O
A—2 end points ------------------correct decision of A;// this is a font dependent decision.
O---0 end points------------------correct decision of 0;
EE 569- Fall’08 | ID- 5994499980 36
We are not using circularity property for O because
We see that circularity of O in training set was not 1. Hence it was not a good parameter to judge.
However, the ratio of area and perimeter are still close.
The reason for this circularity not being 1 is that because the O given in training set is elongated and
not perfect o.
Check for horizontal symmetry:
D & O
D--------------------------------------correct decision of D;
Check for Uppermass/lowermass ratio:
P<R
Check for leftmass & rightmass ratio:
P<R
P & R detected on the basis of this ratio.
This method is invariant of fontsize and style. Hence detects the alphabets correctly here.
Checking for vertical symmetry:
Options: S,T,U.
Here S is under consideration because I am calculating symmetry on the basis of area of left and right.
Because of their structure, the area happens to be same and hence, making them symmetric.
Number of end points check:
Case 3: T—has 3 end points---------------------------detected successfully;
Case else: S,U.
Uppermass/lowermass ratio: Rightmass/leftmass ratio:
S has same UM/LM ratio but U has smaller UM/LM ratio;
U & S are detected.
Checking for horizontal symmetry:
C,E ( the rest are already detected.)
Endpoint check: C has 2 diagonal end points and E has 3 left end points.
C & E Detected properly.
EE 569- Fall’08 | ID- 5994499980 37
Here logically M, K,L should be detected. But since in our test set the detection of euler number for M
is faulty, it reflects in the decision making of M.
Numbers left.
K,L,M
We check for Vertical symmetry: M
M can be detected as A,M,O,N,S,T & U.
M has 2 bottom end points in training set but 4 in test
We check for horizontal and vertical lines.
From the above set only M has one left diagonal and one right diagonal lines. Hence, M matches.
Hence, M is detected properly.
We check for Horizontal symmetry: K. + plus k has 4 endpoints. // Easy decision.
L has one top end point and one left end point.+ L has no symmetry.
So possible choices.
P , R & L.
We check uppermass/lowermass ratio. This only matches with L.
Hence L detected.
Hence, all the alphabets were detected inspite of first check failing. We conclude the second approach
was much efficient inthis case because of the reason mentioned mentioned in above discussion.
PART 2
DETECTION OF NUMBERS (0-9)
DECISION TREE FOR THE TRAINING DATA:
EE 569- Fall’08 | ID- 5994499980 38
When we take decision, we take parameters that doesn’t change on scaling, rotating and only when
there is no other possibility. I have taken area or perimeter rarely and its use minimal in my program.
1)
Check if the Euler Number is 1, 0 or -1.
We first use the Euler number.
We have 3 groups:
E=-1 {8}
E=0 {6,9,4,0}
E=1 {1,2,3,5,7,+,-, . ,/, *, }
2)
If the Euler number is -1, there is only one possibility – the number ‘8’ .so number 8 has been identified!
3)
If the Euler number is 0, there is only 4 possibility – the numbers ‘6,9,4,0’ ,
Now, when we run the training image, we see that ‘0’ has the maximum circularity and upper mass and
lower mass are nearly symmetry. So we can set a small threshold so that it won’t deviate too much in
their symmetry and hence number ‘0’ has been identified!
Next, The numbers ‘6 & 9’ can be differentiated by their leftmass/rightmass (L/R)ratio. In training image
as well as test images, L/R ratio for 6 is maximum while for 9 is minimum. So we can differentiate it by
this ratio and the numbers ‘6’ and ‘9’ has identified!
Then with Euler ‘0’ and aspect ratio minimum ,we can get the number ‘4’ because aspect ratio was
minimum in the test images as well as training images..
So the numbers 6,9,4,0 can be identified
4)
Then with the Euler number ‘1’,we have {1,2,3,5,7,+,-, . ,/, *, }
Since there a lot of characters with this euler number’1’,so utmost care was taken to differentiate the
characters among themselves. An approximate way to detect ‘1’ could be by noting that the Aspect
Ratio of ‘1’ is large enough , ( infact the largest) and also its circularity index is a small value. In my
program I used a threshold value of 1.6 which was obtained by trial and error basis.
We come to the next three possibilities – ‘2’, ‘5’, ‘7’ . A crucial feature that characterizes these three
uniquely is the Upper Mass and the Lower Mass of the bounding box. It is found that the upper mass
and the lower mass in case of the number ‘2’ do not differ by a great extent and it seems to be biggest
among all. Using a small threshold, ‘2’ can be detected. As we can see clearly that the Upper Mass and
the Lower Mass in case of ‘7’ will differ by a large value which is the case in practical . Hence a large
threshold set will detect ‘7’. The other feature that will be able to differentiate 7 is R/L ratio. The ratio
comes approximately to ‘2’.So we can have a threshold from 1.9 to 2.2 and isolate 7.If the above two
conditions do not hold, it can mean only one thing – the number ‘5’.So we have differentiated 2,5,7!
EE 569- Fall’08 | ID- 5994499980 39
Sometimes 7 and 1 can be misjudged. So Another suggested approach to distinguish 7 & 1: (not
implemented)We can take a row-wise histogram of the symbol for its entire height. If it remains almost
constant but with a big peak in the beginning, it is a seven. Otherwise it is a 1. But it’s a difficult and
tedious procedure.
Detecting a ‘3’ was quite a difficult because I couldn’t find a unique parameter that differentiates it from
others. So I had to think a another way .The number ‘3’ can be detected uniquely by some other
method,how?. The difference between the mass of the pixels in the right portion of the bounding box
and the mass of the pixels in the left portion of the box is always a large value. Thus we can choose a
certain threshold to detect the number ‘3’.
5)
Now comes differentiating the symbols. They are 5 unique symbols. Out of which .(dot) can be found
out uniquely. The central moments, Moments, area and perimeters are least for dot(.).So it can easily
detected..
Next, Among the above 4 symbols only ‘-’ and ‘+’ exhibit symmetry. Once detected that they are
symmetric, one way of classifying ‘+’ and ‘-’ is that the normalized area of ‘+’ Is always greater than that
of ‘-’. This condition will hold universally. What I mean by symmetry? For plus, leftmass = rightmass and
uppermass=lowermass and for minus only leftmass=rightmass. So they can be got and detected.
Then we are left with ‘*’ and ‘/’.Centrals moments are approximately same in the test and training
images for /.So we can detect /.For *,leftmass=rightmass and uppermass is greater than lower
mass.Though ‘-’ and ‘*’ properties are likely to be similar,we have to emphasis on the fact that ‘*’ is not
a symmetrical pattern.
So thus based on various parameters and characters ,all the 15 patterns were uniquely identified and
isolated and it was tested using the test patterns below.
DISCUSSION OF RESULTS:
Training Image given used to train
my program,!
EE 569- Fall’08 | ID- 5994499980 40
Word1.raw
Output:
The Number/character is 9
The Number/character is 6
The number/character is 1
The number/character is .
The number/character is 7
My program was able to recognize the characters 7,1,.,6,9.So my threshold value setting for the above
numbers are perfectly correct and we get the desirable output.
Word2.raw Output :
The number/character is 4
The number/character is 2
The number/character is *
The number/character is /
My program wasn’t able to recognize ‘7’ in this word2 and it was able to detect the other words!So
my output was 4,2,*,/ .
METHODS TO OVERCOME THE PROBLEM:
My algorithm to detect the number 7 was U/R (uppermass/lowermass)ratio as mentioned in my
algorithm. The ratio was approximately ‘2’ but in the above image, we can see that ‘7’ was slightly
slanted and it has a different structure than in the training image. So my program wasn’t able to detect
it and in fact we see that u/r ratio was equal in this test image. So this was the reason that I was able to
detect ‘7’ in word1.raw but couldn’t detect ‘’7’’ in word2.raw.
One method to overcome this problem is to manually work out all possible ways of the script for the
letter ‘7’.Once all the scripts are formed, then analysis of all the possible parameters are made and a
certain threshold is obtained and its fed inside the program. Once its fed ,now if we run the program,
then chances of detecting that number is more than the previous method in which I did where we train
the program by giving only one training image. Literally what I mean is, there should be lot of training
images fed to the program and then the test images should be given to the program .Then the program
is less likely to make mistakes in detecting .!
EE 569- Fall’08 | ID- 5994499980 41
I have attached how and why ‘7’ was not detected by my program!