Post on 07-Apr-2020
260 Nimisha Jain, Kumar Rahul, Ipshita Khamaru. Anish Kumar Jha, Anupam Ghosh
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 6, Issue 5
May 2017
Hand Written Digit Recognition using Convolutional Neural
Network (CNN)
Nimisha Jain1 , Kumar Rahul
1 , Ipshita Khamaru
1 . Anish Kumar Jha
1 , Anupam Ghosh
1*
1 Department of Computer Sc. & Engineering, Netaji Subhash Engineering College, Kolkata
A B S T R A C T
The field of machine learning is a rapidly developing
one. Recent developments in the field of image
processing and machine learning has led to efficient
extraction of features from images of peoples’ faces.
Recognizing handwritten digits from images isn’t easy. It
involves the difficulty of visual pattern recognition which
becomes very apparent when an attempt is made to write
a computer program to recognize digits. The goal of our
work will be to create a model that will be able to
recognize and determine the handwritten digits from its
image. We aim to complete this by using the concepts of
Convolution Neural Network. The aim of our study is to
open the way to digitalization. Though the goal is to just
to create a model which can recognize the digits but it
can be extended to letters and then a person’s
handwriting. Through this work, we aim to learn and
practically apply the concepts of Machine Learning and
Neural Networks. Moreover, digit recognition is an
excellent prototype problem for learning about neural
networks and it gives a great way to develop more
advanced techniques like deep learning.
1. Introduction
Feature Recognition extracts features and their
parameters from solid models.This is a pioneering,
state-of-the-art technology from Geometric with
more than 50 man-years of research and
development. Feature Recognition extracts features
and their parameters from solid models. It finds
applications in modeling, design, finite element
analysis, machining, process planning and cost
estimation.Feature recognition is the ability to
automatically or interactively identify and group
topological entities, such as faces in a boundary
representation (B-rep) solid model, into functionally
significant features such as holes, slots, pockets,
fillets, ribs, etc. The features that are not recognized
automatically can be recognized using the
interactive feature recognition facilities [1]. In short,
Feature Recognition makes smart solids out of
dumb solids! This information is then used in
downstream applications.
Handwriting Recognition has an active community
of academics studying it. The biggest conferences
for handwriting recognition are the International
Conference on Frontiers in Handwriting
Recognition (ICFHR), held in even-numbered
years, and the International Conference on
Document Analysis and Recognition (ICDAR), held
in odd-numbered years. Both of these conferences
are endorsed by the IEEE. Active areas of research
include: Online Recognition, Offline Recognition,
Signature Verification, Postal-Address
Interpretation, Bank-Check Processing, Writer
Recognition [2,3].
A lot of the important work on neural networks
happened in the 80's and in the 90's but back then
computers were slow and datasets very tiny. The
research didn't really find many applications in the
real world. As a result, in the first decade of the 21st
century neural networks have completely
disappeared from the world of machine learning [4].
Working on neural networks was definitely fringe.
It's only in the last few years, first seeing speech
recognition around 2009, and then in computer
vision around 2012, that neural networks made a
big comeback. What changed? Lots of data and
cheap and fast DBU's [5,6]. Today, neural networks
are everywhere. So if anyone is doing anything with
data analytics or prediction, they're definitely
something that we want to get familiar with.
The objective of the work is to develop a model that
will classify a picture of a handwritten digit
according to pre-trained labels, for this we found
relevant image database for training the given
model Engineer features that will be used for
learning lay down the feature extraction
mechanism Choose several supervised classification
261 Nimisha Jain, Kumar Rahul, Ipshita Khamaru. Anish Kumar Jha, Anupam Ghosh
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 6, Issue 5
May 2017
algorithm such as KNN classifier, MLP classifier,
CNN classifier. This also involves fine tuning hyper
parameters.
There is a lot happening at the moment in terms of
A.I. development and machine learning innovation.
For instance major auto mobile manufacturing
companies are investing in technology such as self-
driving cars and intelligent parking systems. Tesla,
Google, Ford to name a few. At the same time
Google deep mind ended up developing a general
purpose A.I [7,8]. that learned by itself to play one
of the hardest playing board games Go. Go in
particular is more challenging for a computer to
learn as it requires extreme level of pattern
recognition then logical decision making as the
game can assume more states then there are in the
perceivable universe [9] The AlphaGo project not
only mastered the game but also beat the world
champion 4-1 in a tournament. This feat piqued our
interest in the field of artificial intelligence in
general. With availability of learning resources such
as online learning websites like edx, coursera and
abundance of books written in this domain gave us
a better insight to understanding this field of science
[10]. Also the application of it in the field of image
recognition helped us narrow our topic to something
related to computer vision.
Classification of images and patterns has been one
of the major implementation of Machine Learning
and Artificial Intelligence. People are continuously
trying to make computers intelligent so that they
can do almost all the work done by humans [11].
This continuous effort of making computer
intelligent is to reduce the workload on humans.
Nowadays, machines are able to understand the
conversations, handwriting or any sort of actions by
humans.
The motivation behind this study is to take a step
towards the technology. Due to the increasing
demands of intelligent agents acting like humans,
the understanding of Machine Learning has become
very important in the field of Computer Science.
The work has been taken by keeping this in mind.
Handwriting recognition system is the most basic
and an important step towards this huge and
interesting area of Computer Vision.
2. Methodology
Deep Learning has emerged as a central tool for
self-perception problems like understanding images,
voice from humans, robots exploring the world. The
project aims to implement the concept of
Convolution Neural Network which is one of the
important architecture of deep learning.
Understanding CNN and applying it to the
handwritten recognition system, is the major target
of the proposed system [12-15].
There is a reason behind using CNN for handwritten
digit recognition. Let us consider a multi-layer
feedforward neural network to be applied on
MNIST dataset which contains images of size
28×28 pixels (roughly 784 pixels). So if a hidden
layer has about 100 units, then the first layer
weights comes up to about 78k parameters, which is
large but manageable. However, in the natural
world the size of the image is much larger [16-18].
If we consider the size of the typical image which is
around 256×256 pixels (roughly about 56,000
pixels), then the first layer weights will have about
560k parameters! So that becomes too many
parameters and hence make it unscalable for real
images. Hence, it will be so large that it will
become very difficult to generalize the new data fed
into the network.
Convolution Neural Network extracts the feature
maps from the 2D images by applying filters and
hence making the task of feature extraction from the
images easier [19-22]. Basically, convolution neural
network considers the mapping of image pixels with
the neighborhood space rather than having a fully
connected layer of neurons [23-26].
Convolution Neural Networks has been proved to
be a very important and powerful tool in signal and
image processing. Even in the fields of computer
vision such has handwriting recognition, natural
object classification and segmentation, CNN has
been a much better tool compared to all other
previously implemented tools [27-35].
The broader aim in mind was to develop a M.L.
model that could recognize people's handwriting.
However, as we began developing the model we
realized that the topic in hand was too tough and
would require tremendous data to learn. Example to
accurately classify a cursive handwriting will be
262 Nimisha Jain, Kumar Rahul, Ipshita Khamaru. Anish Kumar Jha, Anupam Ghosh
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 6, Issue 5
May 2017
very tough. Thus we settled on classifying a given
handwritten digit image as the required digit using
three different algorithms and consequently testing
its accuracy.
Architecture of the Proposed System
Figure. 1–Handwritten Recognition system.
3. Explanation of the architecture:
• The first layer of the architecture is the User layer.
User layer will comprise of the people who interacts
with the app and for the required results.
• The next three layers is the frontend architecture
of the application. The application will be
developed using Bootstrap which is the open source
platform for HTML, CSS and JavaScript. The
application is deployed in the localhost which is
shown on the browser. Through the app, the user
will be able to upload pictures of the handwritten
digits and convert it into the digitalized form.
• The one in between the database and view layer is
the business layer which is the logical calculations
on the basis of the request from the client side. It
also has the service interface.
• The backend layer consists of two datasets:
Training Data and Test Data. The MNIST database
has been used for that which is already divided into
training set of 60,000 examples and test of 10,000
examples.
• The training algorithm used is Convolution Neural
Network. This will prepare the trained model which
will be used to classify the digits present in the test
data. Thus, we can classify the digits present in the
images as: Class 0,1,2,3,4,5,6,7,8,9.
Algorithm:
Forward Propagation:
Convolutional layer:
Suppose that we have some N×NN×N square
neuron layer which is followed by our
convolutional layer. If we use an m×m filter ω, our
convolutional layer output will be of
size (N−m+1)×(N−m+1)(N−m+1)×(N−m+1). In
order to compute the pre-nonlinearity input to some
unit xij𝑙 in our layer, we need to sum up the
contributions (weighted by the filter components)
from the previous layer cells:
xij𝑙= 𝛚𝐚𝐛y(i+a)(j+b)𝑙−1𝑚−1
𝑏=0𝑚−1𝑎=0
Then, the convolutional layer applies its
nonlinearity:
yij𝑙=σ (xij𝑙).
Architecture:
Below shown is a small workflow of how CNN
module will extract the features and classify the
image based on it. The architecture shows the input
layer, hidden layers and output layer of the network.
There are many layers involved in the feature
extraction phase of the network which involves
convolution and subsampling .
Figure 2: Architecture of CNN.
263 Nimisha Jain, Kumar Rahul, Ipshita Khamaru. Anish Kumar Jha, Anupam Ghosh
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 6, Issue 5
May 2017
How it works:
Neural Networks receive an input, and
transform it through a series of hidden layers.
Each hidden layer is made up of a set of
neurons, where each neuron is fully connected
to all neurons in the previous layer.
Neurons in a single layer function completely
independently.
The last fully-connected layer is called the
"output layer“.
Convolution Layer:
The Convolutional layer is the core building block
of a CNN. The layer's parameters consist of a set of
learnable filters (or kernels), which have a small
receptive field, but extend through the full depth of
the input volume. During the forward pass, each
filter is convolved across the width and height of
the input volume, computing the dot product
between the entries of the filter and the input and
producing a 2-dimensional activation map of that
filter. As a result, the network learns filters that
activate when they see some specific type of feature
at some spatial position in the input..
Feature Extraction:All neurons in a feature share the
same weights .In this way all neurons detect the
same feature at different positions in the input
image. Reduce the number of free parameters.
Subsampling Layer: Subsampling, or down-
sampling, refers to reducing the overall size of a
signal .The subsampling layers reduce the spatial
resolution of each feature map. Reduce the effect of
noises and shift or distortion invariance is achieved.
Max Pooling: Only the locations on the image that
showed the strongest correlation to each feature (the
maximum value) are preserved, and those
maximum values combine to form a lower-
dimensional output. It provides a form Of
translation invariance. Imagine cascading a max-
pooling layer with a convolutional layer.
Figure 3: Max Pooling
4. Results:
As with any work or project taken up in the field of
machine learning and image processing we are not
considering our results to be perfect. Machine
learning is a constantly evolving field and there is
always room for improvement in your
methodology; there is always going to be another
new approach that gives better results for the same
problem.
The application has been tested using three models:
Multi-Layer Perceptron (MLP), Convolution Neural
Network (CNN). With each model we get a
different accuracy of the classifier which shows
which one is better. The results of training the
network is stored in .npz format so that whenever a
user tries to recognize the digit, the application does
not go into the training loop again. For
classification, we have used logistic classifier,
softmax function, one hot encoding, cross entropy
and loss minimization using mini batch gradient
descent. These are some of the basics of Neural
Network which are required to process the output
from the network and display in the form the user
can understand.
4.1.Dataset used:
The dataset used is the MNIST database of
handwritten digits.It consists of a training set of
60,000 examples, and a test set of 10,000
examples.The digits have been size-normalized and
centered in a fixed-size image.The images are of
size 28*28 pixels.It is a good database for people
who want to try learning techniques and pattern
recognition methods on real-world data while
spending minimal efforts on preprocessing and
formatting.
264 Nimisha Jain, Kumar Rahul, Ipshita Khamaru. Anish Kumar Jha, Anupam Ghosh
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 6, Issue 5
May 2017
The dataset is taken from-
http://yann.lecun.com/exdb/mnist/
4.2 Analysis of the results:
In business, System Analysis and Design refers to
the process of examining a business situation with
the intent of improving it through better procedures
and methods. System analysis and design relates to
shaping organizations, improving performance and
achieving objectives for profitability and growth.
The emphasis is on systems in action, the
relationships among subsystems and their
contribution to meeting a common goal.
Looking at a system and determining how
adequately it functions, the changes to be made and
the quality of the output are parts of system
analysis. Organizations are complex systems that
consist of interrelated and interlocking subsystems.
Changes in one part of the system have both
anticipated and unanticipated consequences in other
parts of the system. The systems approval is a way
of thinking about the analysis and design of
computer based applications. It provides a
framework for visualizing the organizational and
environmental factors that operate on a system.
Proposed Application Module:
The proposed application has been implemented
using Python on Django Framework. The user is
given two options in the home image: Simple
Upload, Model Form Upload. Simple Upload will
allow the user to upload the image and predict it
then and there. After navigating away from that
page, the link to the uploaded image is lost. The
Model Form Upload will allow the user to upload
the image with description. With this link, the user
will be able to store the image and see its link on the
home page itself. By clicking on the link, the user
will be able to get the result from the CNN
classifier.
4.3 Validation of the Results
The application shows the results of the processing
of an image, and feeding it to the network. The
screenshots for the application showing results are
given in the next section. There has been hardly any
cases where the CNN model gives wrong prediction
for a particular image.
5. Conclusion
Handwritten digit recognition is the first step to the
vast field of Artificial Intelligence and Computer
Vision. With the advance of technology, every day
there are new algorithms coming up which can
make computers do wonders. Machines can
recognize images and understand handwriting of
different people which humans themselves have
failed to understand or comprehend from the image.
As seen from the results of the experiment, CNN
proves to be far better than other classifiers. The
results can be made more accurate with more
convolution layers and more number of hidden
neurons.
The target of the proposed work was to open the
way to digitalization. Though the goal is to just
95.50%
96.00%
96.50%
97.00%
97.50%
98.00%
98.50%
99.00%
99.50%
K-NN MLP CNN
Accuracy
Accuracy
Models Accuracy
K-Nearest Neighbour(K-NN) 96.8%
Multi-Layer Perceptron (MLP) 97.3%
Convolution Neural Network
(CNN)
99.1%
265 Nimisha Jain, Kumar Rahul, Ipshita Khamaru. Anish Kumar Jha, Anupam Ghosh
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 6, Issue 5
May 2017
recognize the digits but it can be extended to letters
and then a person’s handwriting. This can
completely abolish the need of typing. People can
straightaway write down in their own handwriting
and then convert it in digital form by just clicking
pictures of it. Digit recognition is an excellent
prototype problem for learning about neural
networks and it gives a great way to develop more
advanced techniques of deep learning.
The broader aim in mind was to develop a M.L.
model that could recognize people's handwriting.
However, as we began developing the model we
realized that the topic in hand was too tough and
would require tremendous data to learn. Example to
accurately classify a cursive handwriting will be
very tough. Thus we settled on classifying a given
handwritten digit image as the required digit using
three different algorithms and consequently testing
its accuracy. In future we are planning to further
explore the topic to recognize people’s handwriting.
REFERENCES [1]DR.KUSUMGUPTA2 ,"A COMPREHENSIVE REVIEW ON
HANDWRITTEN DIGIT RECOGNITION USING VARIOUS
NEURAL NETWORK APPROACHES", INTERNATIONAL
JOURNAL OF ENHANCED RESEARCH IN
MANAGEMENT & COMPUTER APPLICATIONS, VOL. 5,
NO. 5, PP. 22-25, 2016.
[2] Ishani Patel, ViragJagtap and Ompriya Kale."A
Survey on Feature Extraction Methods for
Handwritten Digits Recognition", International
Journal of Computer Applications, vol. 107, no. 12,
pp. 11-17, 2014.
[3] Y LeCun,"COMPARISON OF LEARNING
ALGORITHMS FOR HANDWRITTEN DIGIT
RECOGNISATION",”. In: International conference
on Artificial Neural networks, France, pp. 53–60.
1995.
[4]Faisal Tehseen Shah, Kamran Yousaf,"Handwritten
Digit Recognition Using Image Processing and
Neural Networks", Proceedings of the World
Congress on Engineering, vol., 2007.
[5]Viragkumar N. Jagtap , Shailendra K. Mishra,"Fast
Efficient Artificial Neural Network for Handwritten
Digit Recognition", International Journal of
Computer Science and Information Technologies,
vol. 52, no. 0975-9646, pp. 2302-2306, 2014.
[6]Saeed AL-Mansoori,"Intelligent Handwritten Digit
Recognition using Artificial Neural Network", Int.
Journal of Engineering Research and Applications,
vol. 5, no. 5, pp. 46-51, 2015.
[7]Gil Levi and Tal Hassner,"OFFLINE
HANDWRITTEN DIGIT RECOGNITION USING
NEURAL NETWORK", International Journal of
Advanced Research in Electrical, Electronics and
Instrumentation Engineering, vol. 2, no. 9, pp. 4373-
4377, 2013.
[8]Haider A. Alwzwazy1 , Hayder M. Albehadili2 ,
Younes S. Alwan3 , Naz E. Islam4 ,"Handwritten
Digit Recognition Using Convolutional Neural
Networks", International Journal of Innovative
Research in Computer and Communication
Engineering, vol. 4, no. 2, pp. 1101-1106, 2016.
[9]XuanYang , Jing Pu ,"MDig: Multi-digit Recognition
using Convolutional Neural Network on Mobile",
Stanford University.
[10]Yuhao Zhang,"Deep Convolutional Network for
Handwritten Chinese Character Recognition",
Stanford University.
[11]Jung Kim,Xiaohui Xie,"Handwritten Hangul
recognition using deep convolutional neural
networks", University of California.
[12]Nada Basit , Yutong Zhang , Hao Wu , Haoran
Liu , Jieming Bin , Yijun He , Abdeltawab M.
Hendawi,"MapReduce-based Deep Learning with
Handwritten Digit Recognition Case Study",
University of Virginia, VA, USA.
[13]"Handwritten Digit Recognition using various Neural
Network Approaches", International Journal of
Advanced Research in Computer and
Communication Engineering, vol. 4, no. 2, pp. 78-
80, 2015.
[14]"Handwritten Digit Recognition Using Image
Processing and Neural Networks", Proceedings of
the World Congress on Engineering, vol. 1, 2007.
[15]"Recognizing Handwritten Digits and Characters",
Stanford University.
[16]”Hand Written Digit Recognition using Kalman
Filter”, International Journal of Electronics and
266 Nimisha Jain, Kumar Rahul, Ipshita Khamaru. Anish Kumar Jha, Anupam Ghosh
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 6, Issue 5
May 2017
Communication Engineering, Volume 5, Number 4,
pp. 425-434, 2012.
[17]”Detection and Classification of Brain Tumor Using
SVM and K-NN Based Clustering”, SSRG
International Journal of Electronics and
Communication Engineering, March 2017.\
[18]Hayder M. Albeahdili, Haider A. Alwzwazy, Naz E.
Islam.” Robust Convolutional Neural Networks for
Image Recognition”. (IJACSA) International Journal
of Advanced Computer Science and Applications,
Vol. 6, No. 11, 2015.
[19] Fabien Lauer, Ching Y. Suen, and G´erard Bloch “A
trainable feature extractor for handwritten digit
recognition‖”, Journal Pattern Recognition,Elsevier,
40 (6), pp.1816-1824, 2007.
[20] Chen-Yu Lee, SainingXie, Patrick Gallagher,
Zhengyou Zhang, ZhuowenTu, “ Deeply-Supervised
Nets “ NIPS 2014.
[21] M. Fischler and R. Elschlager, “The representation
and matching of pictorial structures”, IEEE
Transactions on Computer, vol. 22, no. 1, 1973.
[22] Kevin Jarrett, KorayKavukcuoglu,
Marc’AurelioRanzato and YannLeCun “What is the
Best Multi-Stage Architecture for Object
Recognition” CCV’09, IEEE, 2009.
[23] Kaiming, He and Xiangyu, Zhang and Shaoqing,
Ren and Jian Sun “ Spatial pyramid pooling in deep
convolutional networks for visual recognition‖
European”, Conference on Computer Vision,
arXiv:1406.4729v4 [cs.CV] 23 Apr 2015.
[24] X. Wang, M. Yang, S. Zhu, and Y. Lin. “Regionlets
for generic object detection”. In ICCV, 2013.
[25] Krizhevsky, Alex, Sutskever, Ilya, and Hinton,
Geoffrey. “ImageNet classification with deep
convolutional neural networks”.In Advances in
Neural Information Processing Systems 25
(NIPS’2012). 2012.
[26] C. Couprie, C. Farabet, L. Najman, and Y. LeCun.“
Indoor semantic segmentation using depth
information”. International Conference on Learning
Representation, 2013.
[27] R. Girshick, J. Donahue, T. Darrell, and J. Malik.
“Rich feature hierarchies for accurate object
detection and semantic segmentation”. CoRR,
abs/1311.2524, 2013.
[28] O. Russakovsky, J. Deng, H. Su, J. Krause, S.
Satheesh, S. Ma, S. Huang, A. Karpathy, A.
Khosla,M. Bernstein, A.C. Berg, and F.F. Li. “
Imagenet large scale visual recognition challenge”.
International Journal of Computer Vision”
December 2015, Volume 115, Issue 3, pp 211- 252.
[29] RaiaHadsell, Sumit Chopra, YannLeCun,
“Dimensionality Reduction by Learning an Invariant
Mapping” CVPR , vol. 2, pp. 1735- 1742, 2006. 13.
Omkar M. Parkhi, Andrea Vedaldi,and Andrew
Zisserman “Deep Face Recognition” British
Machine Vision Conference, 2015
[30] Dan ClaudiuCiresan, Ueli Meier, Luca Maria
Gambardella, JuergenSchmidhuber, “Deep Big
Simple Neural Nets Excel on Handwritten Digit
Recognition” Neural Computation, Volume 22,
Number 12, December 2010
[31] Sherif Abdel Azeem, Maha El Meseery,
HanyAhmed ,”Online Arabic Handwritten Digits
Recognition “,Frontiers in Handwriting Recognition
(ICFHR), 2012
[32] Liu, C.L. et al.,.” Handwritten digit recognition:
Benchmarking of state-of-the-art techniques”.
Pattern Recognition 36, 2271–2285. 2003
[33] Xiao-Xiao Niun ,Ching Y. Suen “A novel hybrid
CNN–SVM classifier for recognizing handwritten
digits” Pattern Recognition 45 (2012) 1318–1325,
2011.
[34] M. D. Zeiler and R. Fergus, “Visualizing and
understanding convolutional neural networks,”
arXiv:1311.2901, 2013.
[35] Gil Levi and Tal Hassner,” Age and Gender
Classification using Convolutional Neural
Networks”, Computer Vision and Pattern
Recognition Workshops (CVPRW) IEEE, 2015