Approaches for Automated Object Recognition and Extraction from

Journal of Computing and Information Technology - CIT 17, 2009, 4, 359–370doi:10.2498/cit.1001363

359

Approaches for Automated ObjectRecognition and Extraction fromImages – a Study

S. Margret Anouncia and J. Godwin JosephSchool of Computing Sciences, VIT University, Vellore, Tamil Nadu, India

Digital Image Analysis is one of the most challengingand important tasks in many scientific and engineeringapplications. The two vital subtasks in image analysisare recognition and extraction of object(s) of interest(OOI) from an image. When these tasks are manuallyperformed, it calls for human experts, making them moretime consuming, more expensive and highly constrained.These negative factors led to the development of variouscomputer systems performing an automatic recognitionand extraction of visual information to bring consistency,efficiency and accuracy in image analysis. This paperfocuses on the study of various existing automated ap-proaches for recognition and extraction of OOI from animage in various scientific and engineering applications.In this study, a categorization is made based on thefour principle factors (Input, Object, Feature, Attention)with which each approach is driven. All the approachesdiscussed in this paper are proved to work efficiently inreal environment.

Keywords: image analysis, segmentation, feature extrac-tion, computer vision, image retrieval, ontology

1. Introduction

Automatic image analysis is a vital processwhich is indispensable today in numerous appli-cations of different domains. When such tasksare hand-engineered, it demands for human ex-perts, making the system more constrained, dif-ficult to maintain, less portable and economi-cally weak. In response to the aforementionedproblems, many researchers have focused theirwork to contribute solutions for these problemsby developing methodologies which automatethe entire image analysis process with little orno human interventions. Generally, image anal-ysis incorporates two key tasks namely objectrecognition and extraction of the Object(s) of

Interest (OOI) from a digital image. The vitaltasks involved are segmenting the image intomeaningful parts, extracting the desired fea-tures from the image and finally carrying out theextraction procedure. This work focuses on atentative tassonomy of pre-determined systemsin the domain of computer vision for extractingOOI, so that a conceptual framework for discus-sion, analysis and evolution of novel techniquesin the domain can be achieved in the near future.All the works studied in this paper are selectedbased on the complexity involved, its advantageand the adaptable nature of the technique. Thisstudy will enable the researchers in this domainto have a quick review of these works and findwhich technique fits best to their application.This study also provides a way to bring novelideas in this domain, by making an in-depthanalysis of pitfalls in the surveyed works.

The paper is organized into six sections. Input-driven approaches are discussed in Section 2,Section 3 deals with the Feature-driven ap-proaches. In Section 4, a study onObject-drivenapproaches is made. Section 5 deals with theAttention-driven approaches. Section 6 draws asummary of all studied approaches and finallySection 7 concludes this study and points outa new approach which may perform better inanalyzing digital images.

2. Input-driven Approach

The methodologies categorized under this ap-proach heavily depend on the images stored inmemory and the image under analysis. The

360 Approaches for Automated Object Recognition and Extraction from Images – a Study

study made states that when this kind of ap-proach is adopted, it should be noted that pre-processing activities such as enhancing the im-age under analysis become mandatory. Theimages used for referencing must also be ofthe desired quality. The image compressiontasks must also be considered to a certain ex-tent so that an efficient storage structure is es-tablished for storing images. In this section,three methodologies under this approach werestudied. The first relies heavily on the camerainputs, followed by a method based on affinelyinvariant regions to bring in automation in theextraction process, finally a method which com-pletely depends on the images stored in thedatabase, and which is highly domain specific.

2.1. Object Extraction Using MultipleCamera Inputs

A novel algorithm was developed by Jiro Kattoand Mutsumi Ohta wherein they have extractedthe OOI using multiple camera inputs [2]. Theframework developed applies segmentation tech-niques to multiple camera inputs. The entiremethodology is clearly depicted in Figure 1.

Methodology

The methodology developed by Jiro Katto et aluses four cameras where each of the two pairs ofcameras shares optical axes. One pair of cam-eras focuses on the front objects of the image,while the other pair focuses on the rear objectsof the image, following which a fusion (by K-means clustering) is done between the pair ofimages obtained out of different focuses. The

resulting pair of images is compared for detect-ing disparity, deriving a one-dimensional dis-parity vector. This disparity vector reduces pre-diction errors and contributes to the compres-sion function which derives efficient disparitymaps. This prevents encoding of both obtainedstereo pairs, so only one image is appended withthe disparity vector. Image fusion and dispar-ity detection make the segmentation task effi-cient. From the focus map and the disparitymap, an extraction map is derived. When thisextraction map is masked with one of the fusedimages, an initial object extraction is done, butit may include a loss of certain correct clus-ters. A parallel region synthesis is carried outby gathering clusters which have similar statis-tical properties. The number of masked pixelsis then counted in each region, determined bythe region synthesis. Finally, regions contain-ing more masked pixels than that of the pre-determined ratio are selected and integrated toretrieve the OOI.

2.2. Object Extraction Using InvarientRegions

The system under study is developed by T.Tuytelaars, A. Zaatri, Luc Van Gool and H. VanBrussel in which a generic algorithm based onlocal, affinely invariant regions is introducedfor bringing in automation in object recognition[3]. The entire methodology is clearly depictedin Figure 2.

Methodology

The methodology developed by T. Tuytelaarset al, captures the image under study in such away that the distance between the camera and

Figure 1. Object extraction using multiple camera inputs.

Approaches for Automated Object Recognition and Extraction from Images – a Study 361

Figure 2. Automated object recognition system.

the scene is sufficiently large and the scene islocally planar. The methodology is appearance-based an image object is selected fromadatabaseof images to derive a relevant object thatmatchesthe captured image. This selection processpurely depends on affine moment invariantscomputed over the affinely invariant regions.Initially some points are selectedwithin the cap-tured image which delivers high degree of infor-mation. These are corners detected with Harriscorner detector. Around each of the chosenpoints, affinely invariant regions are selected,and a feature vector for affine moment invari-ants (i.e. grey value invariants on local neigh-borhood) is computed. Finally, these vectorsare used to retrieve similar regions from thedatabase of images using Mahalanobis-distance(i.e. distances are measured in units of standarddeviation). The most similar image is then se-

lected using a voting mechanism. So the imagein the database with the highest number of votesis automatically recognized as the desired ob-ject.

2.3. Flag Identification Using ObjectExtraction Technique

Researchers Eduardo Hart, Sung-Hyuk Cha,and Charles Tappert proposed an interactive im-age retrieval technique for identifying flags [4].Figure 3 depicts the entire methodology.

Methodology

The proposal made by Eduardo Hart et al is aninteractive object extraction approach. Usersinteract with the system twice, firstly, in crop-ping the Region of Interest (ROI) from the

Figure 3. Flag identification using image retrieval technique.


photo (i.e. Flag part), secondly, to choose thedesired flag from a minimal choice set. Thetechnique adopted is a content-based image re-trieval scheme where initially the cropping ofthe flag is made and the image is divided into8×8 or 10×10 pixels block. The color and tex-ture feature extraction process is carried out foreach block. The target features are the color andtheir corresponding percentage contribution tothe segmented portion of the flag. The featuresextracted are compared with a flag databasewhich is built in the following way. Apart fromthe original image in the database, each flag im-age is divided into five sections where the firstfour contain 1/4th of the cropped flag image andthe fifth one is brought in by considering 30%of the central area of the total image. So thereare totally 6 images on which Wiener Decon-volution technique is applied so that the effectsdue to the camera movements and ‘out of scope’problems are eliminated. Finally, for structur-ing the database, with the set of 12 samples, 24more samples are created by using the LinearConformal Transform over those samples. Nowthe color feature extraction is done by cluster-ing the colors into 9 colors using the HSV colorvalue. The final task to be carried out in thismethodology retrieves based upon the countryand synthesis. This methodology showed thatits performance lies between 82% and 93%.

3. Feature-driven Approach

Following the input driven approaches, feature-driven approach focuses entirely on high di-mensionality of the segmented region for objectrecognition. Suppose an image segmentationtechnique divides an image into n-regions, eachregion can be determined with various attributesand properties. These features are distinctlycategorized into two types namely, geometricfeatures (i.e. angle, shape etc.) and statisti-cal features (i.e. distance, mean etc.) [10, 8],so that when a best-input subset from a givenimage space is selected, the recognition and ex-traction of the objects can be done with more ac-curacy and in a cost effective manner comparedto that of the Input-driven approach. In thissection, four methodologies are surveyed. Thefirst methodology analyzed under this categoryuses Artificial Intelligence for classifying out-door images. The second technique is a research

work, where an edge-based segmentation tech-nique is proposed for object recognition. Thethird methodology makes the entire feature ex-traction process in an automated way by using anovel framework and the final methodology dis-cussed under this section is a robust real timeobject detection technique.

3.1. Artificial Intelligence Technique forOutdoor Image Classification

Researchers Neill W. Campbell, William P.J.Mackeown, Barry T. Thomas and Tom Tros-cianko developed a methodology for classifica-tion of outdoor images in an automated fash-ion [5].

Methodology

The methodology proposed by Neill W. Camp-bell et al uses a Bristol image database contain-ing a large set of high quality color images.The image under consideration is segmentedusing K-means clustering of the grey level his-togram. The value K is taken to be 4 so thatthere remains an optimal balance between underand over segmentation. From these segmentedclosed regions, 28 features are extracted, amongwhich some of the important features are aver-age color, position, size, rotation, texture (usingGabor filter), and shape (using principle com-ponents). The feature measure, in the context ofthe segmented region, is also derived. Finally, aneural approach is brought in for the classifica-tion purpose, where the system is trained with7000 sample regions and tested for 3000 testregions. Using the conjugate gradient descentoptimization, the neural net is found to have 24nodes in the hidden layer so the encoding willbe 1-in-n with 11 possible output labels. Mis-classifications do occur in this type of approach(e.g. road and pavement) due to under or oversegmentation. This approach classifies 80% ofthe regions correctly, corresponding to 91.1%of the image area. Classification accuracy bymultilayer perception classifier is 91.1% and bylearning vector quantization 76.7%.

3.2. Scale Map-based ImageSegmentation for Object Recognition

A research work done by S. Kondra focusedon usage of low level visual features, for edgedetection, segmentation and recognition of theobject [6].


Methodology

In the research made by S. Kondra, a scale mapis introduced which is very similar to that ofLindeberg Scale Formula, but uses 8 directionsand takes the sum of all orientations in orderto determine the scale for each point in the im-age. The work also introduced rotational scalemap where, instead of using differential filters,rotational filters are employed. In the result-ing image, black indicates a lower scale whilewhite indicates a higher scale. These two scalesplay a vital role in the segmentation. Initially,the image is segmented using watershed seg-mentation. A novel LINK method is introducedin which more than two segments are mergedin one step. For comparing the segments, his-togram distribution of the scale map, rotationalmap and CIELAB color are used. Each segmentis compared with the neighboring segments andin the next step they are aggregated. Some smallsegments remain which cannot be merged basedonly on texture, so for these segments the differ-ence of CIELAB distribution is considered onlyat the boundary of the segment, if the boundaryis not salient then the technique proceeds withmerging the segments. This technique has a hu-man global consistency error measure of 0.184which is comparable to the human GCE whichis 0.08.

3.3. Adaptive Object RecognitionFramework for Image Analysis

A research made by Ilya Levner, Vandim Bu-litko, Lihong Li, Grreg Lee and Russel Greinerfor automating the image feature extraction pro-cess by developing a novel Multi-ResolutionAdaptive Object Recognition framework (MRADORE) [7].

Methodology

MR ADORE framework developed by Ilya Lev-ner et al uses a readily available off-the-shelfImage Processing Library (IPL). The processinvolves a significant amount of complexity asthe libraries are domain independent. It calls foran intelligent control policy for applying variouslibrary operators, since a series of operators arerequired to make the entire image interpretationprocess more efficient and reliable. The de-cided IPL operators are termed as MDP actionsand after applying these operators the results

are termed as MDP states. In order to recog-nize the object, they employ off-line and on-linemachine learning techniques. Each training da-tum consists of two images, the input image andits user annotated counterpart allowing the out-put of the system to be compared to the desiredimage label. In the off-line training scheme, theinitial image is applied with several sequencesof IPL operators up to a certain user-controlledlength, the resulting in MDP states. This is fol-lowed by computing a Q-function for each ofthe sampled states defined as Q(s,a) where ‘s’is the state and ‘a’ is the actions. This Q-valueis evaluated against a scoring metric reward-ing scheme and a desired label for the image islearned by the system. When features are alsoconsidered, it makes supervised machine learn-ingmethods practically feasible. In this case theQ – function can be modified as Q(f(s),a)where‘f’ is the feature based onwhich the actions fromIPL are adopted. The only purpose of employ-ing this off-line learning phase is to constructan on-line control policy. This prevents the de-cisions based on frequently incomplete infor-mation provided by imperfect features, whichin turn leads to having a significant increase inthe interpretation quality. As the control pol-icy calls for automated extraction of high qual-ity features, a K-nearest neighbor (KNN) algo-rithm (basically a case-based approach) whichapproximates the Q-value of the input tokenbased on distance(s) to the nearest training ex-ample(s) is adopted. This work implements andtests five policies of the aforementioned tech-niques and it has found that 1-NN and ANN(Artificial Neural Network) together with handcrafted feature, provide the best interpretationin which the accuracy is about 83%. The onlypitfall is that huge training set is needed for en-hancing the performance of 1-NN algorithm. Ahigher degree K-NN algorithm can be adoptedto overcome this pitfall.

3.4. Rapid & Robust Object DetectionFramework for Face Recognition

This technique was developed by Paul Violaand Michael Jones for robust real time objectdetection [12].

Methodology

The technique proposed by Viola-Jones aimesat developing an object detection frameworkthat is capable of processing image rapidly and


achieving high detection rates. Three key con-tributions enable this technique to achieve thedesired framework. Firstly, a new image repre-sentation called integral image, where in pixelvalue at (x,y) is the sum of pixel values of theoriginal image above and to the left of (x,y). Sothis integral image can be computed by one passthrough the image. This allows quick compu-tation of features which are to be used by thedetector. Second contribution is the AdaBoostlearning algorithmwhich selects a small numberof critical visual features by yielding an efficientclassifier. The classifier is learned from labeleddata, about 4916 hand labeled faces are used andall these faces are normalized. In addition to it,the training data also considers the variation fac-tors like illumination, pose etc. The final con-tribution is a method of combining classifiers ina cascaded way so that background regions canbe quickly discarded while spending more timeon object like regions. This technique, whenimplemented on a conversional desktop, exe-cutes face detection at speed of 15 frames persecond.

4. Object-driven Approach

Following the feature-based techniques,method-ologies adopting, a kind of approach where inanalyzing digital images, heavily depends uponthe attributes contained in the OOI. Adoptingthis approach for addressing the image inter-pretation problem is widespread due to its in-trinsic flexibility and the effectiveness of theresult. Four methodologies are surveyed underthis category. First methodology studied un-der this category is used for classifying CADmodels. The second comes from an ontologicalapproach for object recognition, followed by anovel technique for an enhanced segmentation-based object recognition and, finally, a methodfor detecting objects based on foreground imageextraction is surveyed.

4.1. A Supervised Learning Technique forClassification of CAD Models

The methodology developed by researchersCheuk Yiu Ip and William C. Regli focused onContent-based classification of CAD (ComputeAided Design) models [9].

Methodology

The method proposed by Cheuk Yiu Ip et al is aposteriory technique wherein an algorithm runson a model and it finally produces a category orlabel for it. In comparing 3D CAD models, ba-sically two basic techniques exist. They are thefeature-based and the shape-based techniques.In the first technique, a graph based data struc-ture is used for process plan generation, whileintegrating the ideas in the data base with thegenerated plan indexing and clustering of CADmodels is achieved. The second technique isbased on the computer geometry, computer vi-sion and computer graphics.

As most of the engineering models have multi-ple classifications based on the manufacturingcost and the manufacturing process, a machinelearning mechanism is applied for the classifi-cation purpose. Initially, a selection of invari-ant features is made from the models to forma set of n comparable vectors represented as<a1,a2,. . .,an >. For certain samples say S1,S2, S3, if we need to check whether these sam-ples belong to the same category say C and ifthe a distance between two samples is givenby D(S1,S2) which is derived from the fea-ture vectors, and also if s1 and s2 belongs tosame category, then D(S1,S2) is always lesserthan D(S1,S3). A technical approach is adoptedwherein for each CAD model a shape distribu-tion histogram is derived. Matching two shapedistributionmodels calls for the following steps.Initially, a shape function is selected followedby the sampling of random points after whicha point-pair distance is calculated from which ashape distribution histogram is derived. Whilecomparing the shape distributions, Murkowskidistance is employed. The classification pro-cess is done by using the nearest neighborhoodtechnique. Experiments were conducted to testthe classification correctness. The proposedmethod has 76% of highest classified correct-ness and 63% of average classified correctness.The correctness rates are not very attractive.Still, it can be considered, since there is no au-tomated system for classifying 3D model, thecorrectness rate can be significantly increasedby improving the quality of the training sets.


4.2. Recognition Model for SemanticImage Indexing

The methodology developed by Nicolas Maillotand Monique Thonnat is an object recognitionmodel for semantic image indexing [11] wherethe main focus lies in finding out the semanticclass of a physical object observed in the imageusing the concept of ontology.

Methodology

The research work done by Nicolas Mailloit etal focused on finding out the semantic class ofa physical object observed in the image usingthe concept of ontology. Ontology, in general,is a set of concepts and relations used to de-scribe a domain. The proposed methodologyfor performing image indexing, calls for twophases, a knowledge formulization phase whichis done by hierarchy of object classes describedby visual concepts and amachine learning phasein which the numerical features are mapped tosemantic concepts. The learning is made byadopting a multi-layer neural network in whichboth positive and negative samples of the visualconcept are considered, followed by feature ex-traction for each sample is which the requiredfeatures are selected (color, shape and textu-ral) for training the machine which enables thetask of visual concept detection. When an in-put image is given for analysis, an automaticsegmentation is made, followed by the featureextraction process so that all the classes fromthe root to the leaf in the ontological struc-ture are scanned to get a confidence level, forthe given image. After getting the confidencelevel, a semantic query is generated implicitlyand the image is retrieved in accordance withthe query. Experimental results determine thatthe precision of the proposed approach has 73%precision value and 19% recall value. Todaythe concept of ontology is implemented to ad-dress various problems in image interpretation.Medical image analysis, defect detections andreal-time objects monitoring systems are somecore areas wherein many researchers focus theirwork adopting this ontological approach for im-age analysis.

4.3. Reinforced Segmentation Mechanismfor Object Recognition

The two approaches surveyed earlier have theirperformance determined by a rich training set

and expert knowledge. In those domains wherethere is a lack of a priori knowledge and lesssamples for training, a new learning mecha-nism called Reinforced Learning (RL) can beadopted. An attempt was made by FarhangSahba, HamidR.Tizhoosh,MagdyM.A. Salamato increase the object recognition rate using thereinforced segmentation mechanism [13].

Methodology

The methodology proposed by Farhang Sahbaet al focuses on making the Object Recognition(OR) system more adaptable and reacts clev-erly to the changes in the environment. Theconcept of reinforced learning is mainly basedon rewards or punishments to be assigned toa behavior under a specific context, which en-ables the system to perform better. The methodadopted for RL is the Q-learning method pro-posed by Watkins[1]. In this technique, theconstruction of RL relies heavily on 3 definedcomponents called state, action and reward. AQ-matrix is built depending upon the defini-tions of states and actions, followed by a match-ing criterion wherein, if there exist more thanone object which has its matching features, theobject with the closest match with respect tothat of a threshold is considered. For definingstates, they have considered an object descrip-tor (a shape descriptor representing the pixel ar-rangement), area (area of the extracted object)and finally the number of objects (consideredwith respect to the threshold value). In orderto define the actions, the agent must adjust thethreshold value and the size of the structuringelement for morphological operations. The as-signment of reward or punishment is based on asimilarity measure. The system is trained usingsome reference objects, where a Q-matrix is de-veloped using a Boltzman policy. Agents mustfind appropriate threshold and post processingparameters so that a reference object can be ver-ified. Incorporating more states and more datasets enables this approach to be compared withthe well established recognition techniques.

4.4. Object Extraction Technique forForeground Images

Final methodology discussed under this cate-gory is a work done by researchers Yea-ShuanHuang and Fang-Hsuan Cheng, resulting in a


technique for foreground image extraction andobject detection [14]. The proposed methodol-ogy is clearly depicted in Figure 4.

Methodology

In the methodology proposed by Yea-ShuanHuang et al, objects projected over the fore-ground of the image are considered essential forthe detection and extraction purpose. Initially aFrame-based Distinct Pixel Extraction (FDPE)is performed, which completely extracts pixelshaving quite different features from their back-ground, after which an object detection task isinitiated that checks for instances of the neededobject. Finally, a Region-based ForegroundOb-ject extraction (RFOE) is made wherein the fo-cus is on extracting the foreground pixels of theObject Of Interest (OOI). As this method ba-sically focuses on foreground pixel extraction,the background modeling (as the foregroundpixels are both object and application depen-dant) plays a vital role. The object-oriented ap-proach employed here combines both the con-ventional background subtraction and object de-tection techniques. In this method I(x,y) repre-sents a pixel in the image, B(x,y) is a pixel inthe background and D(x,y) is the distinctnessindex, D(x,y) = 1 when Prob(I(x,y),B(x,y)) <T (T – predefined probability threshold value)and D(x,y) = 0, otherwise. So, when the dis-tinctness index is nearing 1, it means that it’s aforeground pixel. Following this task, the re-gion scan is initiated where the shape of theregion is considered, and those regions havingenough number of distinct pixels are used for

object detection. An AdaBoosting algorithm isadopted, which uses a re-weighting rule to con-struct a strong classifier. Finally for extractingthe objects, a criteria denoted by L(x,y) whichis 1 when Prob(I(x,y),B(x,y)) < T(x,y) whereT(x,y) is the threshold of pixel (x,y) which isthe function of position x and y. This deter-mines whether a pixel is in the foreground or inthe background. When the L(x,y) and D(x,y)are determined, an integration process definedas L(x,y) OR D(x,y) is evaluated and if this ORoperation results in 1, it is a foreground pixel.Background updating is also done in order tohave more satisfactory results. An experimen-tal result of the proposed approach clearly statesthat it performs a speedy and accurate extractiontasks.

5. Attention-driven Approach

Visual attention plays a vital role in analyz-ing images, by focusing on objects that at-tract the attention of human observers. Thegeneral open problem is the gap between thelow-feature content and the semantic image in-terpretation. Most working models based onthe attention-driven technique are consideredpotentially well suited for addressing content-based image retrieval (CBIR) problems. Mostof the early CBIR systems rely completely onregion-based extraction while attention-drivenapproach identifies those regionswhich aremoreinformative and feature rich, in order to performan effective image retrieval task.

Figure 4. Object-oriented foreground image extraction.


5.1. Attention-driven Model for ContentBase Image Retrieval

Researchers Ying-Hua Lu, Xiao-Hua Zhang,Jun Kong and Xue-Feng Wang proposed anattention-driven model for addressing the CBIRproblem [15]. The entire methodology is clearlydepicted in Figure 5.

Methodology

The Methodology proposed by Ying-Hua Lu etal carries out the segmentation task by a novelclustering algorithm called Expectation Maxi-mum (EM) which automatically segments theimage into several homogeneous regions, basedonly on the objects. This EM algorithm in-corporates two key activities. Initially, featureextraction is done depending upon a three di-mensional color descriptor.

Following the segmentation, a modified Itti-Koch model of visual attention is employed toplay a role in attention selection, from a bottom-up perspective. Salient points are extracted bythis model. Initially, the feature extraction pro-cess is carried out using a wavelet which coulddecrease the interference of the background in-formation. Wavelets are used to create 9 differ-ent spatial scales and each level is decomposedinto red, green, blue, yellow, intensity and localorientation. If the red, green and blue chan-nels are normalized by image intensity, a localorientation information is obtained by applyingwavelets to the intensity. From these channels,center-surrounded feature map is constructed.In their work, 36 feature maps are computed: 6for intensity, 12 for color and 18 for orientation.Over the obtained feature maps, a combinationscheme is made to normalize each feature mapto a fixed dynamic range. Next, the summation

of these maps is done for deriving the desiredsaliency map.

Following the saliency map construction, thetechnique concentrates on extraction of OOI,using the segmented image and the saliencymap derived. The salient points produced byItti-Koch algorithm are used to initiate a con-trolled region growth to the boundaries estab-lished by EM (Expectation Maximum) algo-rithm. Finally, the resulting image contains theextracted object of interest. This model resultsin an image including the prominent object inthe scene. So from the extracted objects, fea-ture vectors from features like Color (HSV) andTexture (mean, standard deviation, smoothness,third moment, uniformity and energy) are ex-tracted. Model proposed by Ying-Hua Lu et alcompletely relies on the result of the segmenta-tion. Objects cannot be extracted form poorlysegmented regions.

A similarity measure is carried out by matchingregions, using a novel distance method calledRegion-based quadratic distance to extract thedesiredOOI. This R-quadratic distancemeasureis achieved by calculating the distance betweenany two regions. A distance measure used inthis method is the Euclidean distance. Experi-ments were conducted on the 5000 images fromthe COREL database. It shows that the pro-posed model ranks best with an overall retrievalaccuracy. To qualitatively evaluate the retrievaleffectiveness of the proposed model, over theentire image database 75 images of 5 categoriesare randomly selected as query images. A com-parison with 7 peer retrieval methods is donewhose result states the proposed model to havehigh overall effectiveness.

Figure 5. Attention-driven model for CBIR.


The significance of the proposed attention-drivenmodel is its effectiveness in conquering some ofthemain limitations prevailing in several region-based image retrieval models. As it employsthe result of a biologically-inspired bottom-upmodel to detect most salient peaks in an image,these peaks are often used to extract attentiveobjects, which in many real time cases corre-sponds to semantically meaningful objects.

6. Summary

Table 1 summarizes all the discussed method-ologies. There are many methodologies avail-able today to address the image analysis task,but those methodologies surveyed in this papershowed attractive figures for rate of accuracy inrecognition and extraction of OOI. A study overthe advantages and limitations of every methodis made. In this summary, the domains whereeach method can be applied are also analyzed.

APPROACH METHODOLOGY ADVANTAGE LIMITATION APPLICATION

Input-drivenApproach

1. Uses Multiple camera inputs,fuses the inputs and then detectsdisparity for segmentation followedby region synthesis and masking,to extract the OOI.

Improvedextractionperformance.

Performance di-rectly dependson the numberof differently fo-cused images.

1. MPEG4 – Re-gion Standardiza-tion. 2. Virtual-Studio – extract ob-jects from uniformbackground.

2. Harris Corner detector gives thepoints from which affinely invari-ant regions are selected. Featurevectors are derived for each re-gion and a appropriate match isderived from the database by a vot-ing mechanism.

Worksefficiently evenwhen there arelarge changes inthe view point.

Sensitive toenvironmentalchanges (i.e.illumination,shadow etc.).

In supervisorycontrol systemsto recognize theobjects.

3. The Object of interest is iden-tified in an interactive way, thecropped flag image is divided intoimage blocks for which Weiner de-convolution is applied, followed bya linear conformal transformation.Finally, the HSV color feature ex-traction is done to get a minimalchoice set from which the user se-lects a flag.

Attractiveflagidentificationresults(82% − 93%).

Demands formore effectivesegmentationtask andfeatureextractionprocess.

In flagidentificationsystems.

Feature-drivenApproach

1. A Bristol database is used forstoring high quality images. K-means clustering is employed forsegmentation, task followed by fea-ture extractionwhich extracts about28 features. Finally, a neural net-work is trained with 7000 sampleregions in order to enhance the ac-curacy of the classification process.

Increasedclassificationaccuracy91.1%.

More trainingdata is required.

In monitoringsystems whichrecognize thecaptured objects.

2. Two scale maps namely scalemap and rotational scale map areintroduced. A novel LINK methodis introducedwhich combinesmorethan one region in one step by com-paring the histogram distribution ofthe scale map, the rotational mapand the CIELAB color. Finally,the aggregated region results in theextraction of the OOI.

Is easilyadaptable togrey scaleimages usingintensity.

On linkingsmallersegmentsilluminationproblems occur.

This novel approachcan be deployedwidely in extract-ing OOI from col-ored images andgrey scale images.


APPROACH METHODOLOGY ADVANTAGE LIMITATION APPLICATION

3. A novel MR ADORE frameworkwhich uses an IPL is employed toextract the OOI. Both off-line andon-line learning mechanisms areadopted and finally a KNN algo-rithm is used for recognizing theobjects.

Completely au-tomated featureextraction pro-cess. Accuracyis about 83%.

Large trainingset is requiredto enhance theperformance.

Can be deployedwidely inautomaticrecognitionsystems.

4. A robust real time object de-tection technique enables rapid im-age processing with high detectionrates. It uses a new integral imagerepresentation, an AdaBoost learn-ing algorithm and a cascaded clas-sifier for the desired face detection.

High speeddetection,experimentallyface detectionproceeds at 15frames persecond.

Largeconstrainedtraining datalike under vari-ations of illumi-nation etc.

In digital camerasfor face detection.

Object-drivenApproach

1. It is a posteriory techniquewherein a invariant feature vec-tor for a model under considerationis derived and tested against thepre-determined feature set to findthe appropriate match in order todecide a label for the model.

Proposedtechnique isfully automated,customizable.

1. Classificationrate heavily de-pends on thequality of train-ing set. 2. Lessadaptable toother domains.

In automated sys-tems which per-form part classifica-tion schemes for 3DCAD data.

2. This technique is focused onfinding out the semantic class ofa physical object observed in theimage using the concept of ontol-ogy. It is a purely object-orientedapproach.

Highlyadaptable innature.

Depends ontraining dataand ontologystructuring.

In diverse do-mains where know-ledge can be eas-ily structured interms of conceptsand relations.

3. In this methodology an effectivereinforced learning mechanism isintroduced by adopting Q-learningtechnique. A threshold techniquefor recognizing and extracting theobjects from the image.

Highlyadaptable as itrelies less on ex-pert knowledge.

Depends onlarge trainingdata sets.

Being generic, amethod which canbe employed in sys-tems which identifyand analyze the cap-tured objects.

4. In this technique a FDPE is donefollowed by a region scan to decidewhether the objects are detectedfrom extracted pixels, if there areenough pixels. Otherwise, RFOEis made to recognize and extractthe required OOI.

Speedy andmore accuratin classification.

Demands forconstrainedenvironments.

In visual studiosto extractobjects fromuniformbackground.

Attention-drivenApproach

1. In this technique only thoseregions which are feature rich areconsidered. An EM algorithm re-sults in clustered groups and theItti-Kochi model derives a saliencymap from which a mask is gener-ated. When this mask is intersectedwith the image under study, theOOIis extracted.

Best overallretrievalaccuracy(comparisonmade with 7peer retrievalmodels).

Demandsrefinement inextracted OOI.

In CBIR systemssuch as medical im-age analysis, trafficmonitoring systemsetc.

Table 1. Comparison of various methodologies.

7. Conclusion

The paper focuses on a survey of various ap-proaches evolved in the last decade which fa-cilitated the image analysis task, by bringing in

automation in object recognition and extraction.Those approaches discussed in this survey arecategorized based on the defined principles (In-put, Feature, Object and Attention) with whicheach approach is driven. In spite of the fact that


most of the discussed approaches perform well,there is a possibility to bring in more objectivityin the interpretation task in those systems whichhave adopted these approaches. From the ten-tative tassonomy of pre-determined systems ina real environment, as most of the image in-terpretation systems are highly domain specificand rely much on expert knowledge, a memory-based problem solving principlemay be adoptedin the near future to enhance the analyzing per-formance of the system. In spite of relyingcompletely on the expert knowledge, an adap-tation can be incorporated within the system,which can make them derive better results. Oneof the novel methodologies under this principleis Case-based Reasoning (CBR) which can def-initely facilitate the domain specific systems toperform with more accuracy, consistency andefficiency.

References

[1] C. J. C. H. WATKINS, P. DAYAN, Q-Learning Ma-chine Learning. Vol. 8, pp. 279–292, 1992.

[2] KATTO, J. OHTA, M., Novel Algorithm for ObjectExtraction using Multiple Camera Inputs. Proc. ofInternationalConference on Image Processing, Vol.2, pp. 863–866, 1996.

[3] TUYTELAARS, T. ZAATRI, A. VAN GOOL, L. VANBRUSSEL, H., Automatic Object Recognition as apart of an Integrated Supervisory Control System.Proc. of International Conference on Robotics &Automation, Vol. 4, pp. 3707–3712, 2000.

[4] E. HART, S.-H. CHA, C. C. TAPPERT, InteractiveFlagIdentification using Image Retrieval Techniques. In:CISST 2004, pp. 441–445, 2004.

[5] N. W. CAMPBELL, W. P. J. MACKEOWN, B. T.THOMAS, T. TROSCIANKO, The Automatic Clas-sification of Outdoor Images. Proc. of InternationalConference on Engineering Applications of NeuralNetworks, pp. 339–342, 1996.

[6] S. KONDRA, Image segmentation for object recog-nition. In: Sixth Frame Work Programme, MarieCurie Actions, HumanResource andMobilityMarieCurie Research Trainings Networks (RTN), 2004.

[7] I. LEVNER, V. BULITKO, L. LI, G. LEE, R. GREINER,Automated feature extraction for object recognition.In: Image and Vision Computing’03 New Zealand,Palmerston North, New Zealand, 2003.

[8] S. MANSOURI ALGHALANDIS, GH. NOZAD ALAM-DARI, Welding Defect Pattern Recognition in Ra-diographic Images of Gas Pipelines using AdaptiveFeature Extraction Method and Neural networkClassifier. In: 23rd World Gas Conference, Amster-dam, 2006.

[9] C. Y. IP, W. C. REGLI, Content-based Classifica-tion of CAD models with Supervise Learning. In:Computer-aided Design and Applications, Vol. 2,no. 5, pp. 609–617, 2005.

[10] ANOUNCIA S. M., SARAVANAN R., Non-destructiveTesting using Radiographic Images – a Survey.British Journal of Non-Destructive Testing, Insight,Vol. 48, no. 10, pp. 592–596, 2006.

[11] N. MAILLOT, M. THONNAT, Object Recognition forSemantic Image Indexing. In: Orion team INRIASophia Antipolis FRANCE.

[12] P. VIOLA, M. JONES, Robust Real Time ObjectDetection. In: Second International workshop onstatistical and computational theories of vision –modeling, learning, computing, and sampling, 2001.

[13] SAHBA, F. TIZHOOSH, H. R. SALAMA, M. M. A., In-creasing Object Recognition Rate Using ReinforcedSegmentation Proc. of IEEE International Confer-ence on Image Processing, pp. 781–784, 2006.

[14] Y.-S. HUANG, F.-H. CHENG, Object-orientationForeground Image Extraction. Proc. of 2nd In-ternational Conference on Innovative Computing,Information and Control, pp. 407, 2007.

[15] Y.-H. LU, X.-H. ZHANG, J. KONG, X.-F. WANG, ANovel Content-based Image Retrieval ApproachBased on Attention-driven Model. Proc. of Interna-tional Conference on Wavelet Analysis and PatternRecognition, pp. 510–515, 2007.

Received: October, 2008Revised: May, 2009

Accepted: May, 2009

Contact address:

S. Margret AnounciaJ. Godwin Joseph

School of Computing SciencesVIT University, Vellore

Tamil Nadu, Indiae-mail: [email protected]

DR. S. MARGRET ANOUNCIA is a Professor at the School of ComputingScience and Engineering at VIT University, Vellore, India. She re-ceived a Bachelor’s degree in Computer Science and Engineering fromBharathidasan University in 1993 and a Masters in Software Engineer-ing from Anna University in 2000. She has been awarded doctorate inComputer Science and Engineering at VIT University in 2008. She hasteaching and research experience of about fifteen years. Her area ofspecialization includes Digital Image Processing, Software Engineer-ing and Knowledge Engineering. She is a life member of the ComputerSociety of India and of the Indian Society for Technical Education. Shehas published five papers in international journals and presented aroundfifteen papers at various national and international conferences.

MR. J. GODWIN JOSEPH is currently a M.S. scholar at VIT University.

Approaches for Automated Object Recognition and Extraction from

Documents

Transcript of Approaches for Automated Object Recognition and Extraction from