ClLASSIFICATION OF VEGETABLES BASED ON DECISION TREE FOR MULTICLASS PROBLEM

10
International Journal of Image Processing and Visual Communication ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012 42 Classification of Vegetables based on Decision Tree for Multiclass Problem Suresha M 1 , Ravikumar M 2 Department of Computer Science, Kuvempu University Karnataka, India 1 [email protected] 2 [email protected] AbstractIn this paper, we have proposed a method for classification of vegetables based on the extraction of texture properties. The work has been carried out using watershed for segmentation. The vegetable texture features like red component, green component, skewness, kurtosis, variance, and energy are extracted. The method has been employed to normalize vegetable images and hence eliminating the effects of orientation using image resize technique with proper scaling. Classification is done using Mean around features, Gray level Co-occurrence matrix (GLCM) features and combined (Mean around-GLCM) features. Decision trees classifier is used for classification of vegetables in to eight classes. Splitting rules for growing a decision tree included in this work are Gini diversity index(gdi), Twoing rule, and Entropy. Results obtained from the proposed method are well accepted and solutions are good agreement with the experts. Proposed approach is experimented on vegetable data set using cross validation and found good success rate. KeywordsDecision Tree Classifier, GLCM, Mean-around Features, Texture Features, Vegetables Classification. I. INTRODUCTION Recognizing different kinds of vegetables and fruits is a recurrent task in supermarkets and food processing industries. Often, one needs to deal with complex classification problems. In such scenarios, using just one feature set to capture the classes’ seperability might not be enough and more features may become necessary to improve the accuracy of classification. Besides it has the drawback of increasing the dimensionality of the data which might require more training samples but increases the accuracy. This paper presents method for classification of vegetables with Mean around features, GLCM features and combined features. There are eight types of vegetables considered for this work, namely Cabbage, Beetroot, Capsicum, Carrot, Chillies, Cucumber, Bittermelon and Onion. In the proposed method vegetables are classified based on the Mean Around features like, red color component, green color component, Kurtosis, Variance and Gray level co-occurrence matrix (GLCM) features like Contrast, Correlation, Energy and Homogeneity. In the rest of this paper, we describe some related works briefly in Section II. We present segmentation methodology in Section III, which includes segmentation using watershed segmentation. Feature extraction and classification of arecanuts using decision trees, presented in section IV and V respectively and we put experimental results and analysis in Section VI. Finally, conclusions are drawn in section VII. The block diagram of overall process is given in Fig. 1. Fig. 1 Block diagram of overall process II. LITERATURE SURVEY In [18], a new model of automated grading system for oil palm fruit is developed using the RGB color model and artificial fuzzy logic. The mean color intensity based on RGB color model is determined and achieved 86.67% accuracy in overall categories. In [14], a methodology for recognition and classification of fruits in fruits salad image samples. The samples of different fruits like Apple, Chikku, Banana, Orange and Pineapple are considered. Each sample of fruits are sliced into pieces and placed on the tray. The RGB color features extracted from the images from the knowledge base. A K mean classifier is proposed and has the classification efficiency of around 98%. In [12], an efficient fusion of color and texture features for fruit recognition. The recognition is done by the minimum distance classifier based upon the statistical and co-occurrence features derived from the Wavelet transformed sub-bands. Experimental results on a Input Image Segmentation [Watershed Segmentation] Feature Extraction Decision Tree Classifier for Classification Labelled Image Mean around Features GLCM Features Combined (Mean around-GLCM) Features

description

In this paper, we have proposed a method for classification of vegetables based on the extraction of texture properties. The work has been carried out using watershed for segmentation. The vegetable texture features like red component, green component, skewness, kurtosis, variance, and energy are extracted. The method has been employed to normalize vegetable images and hence eliminating the effects of orientation using image resize technique with proper scaling. Classification is done using Mean around features, Gray level Co-occurrence matrix (GLCM) features and combined (Mean around-GLCM) features. Decision trees classifier is used for classification of vegetables in to eight classes. Splitting rules for growing a decision tree included in this work are Gini diversity index(gdi), Twoing rule, and Entropy. Results obtained from the proposed method are well accepted and solutions are good agreement with the experts. Proposed approach is experimented on vegetable data set using cross validation and found good success rate.

Transcript of ClLASSIFICATION OF VEGETABLES BASED ON DECISION TREE FOR MULTICLASS PROBLEM

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    42

    Classification of Vegetables based on Decision Tree

    for Multiclass Problem

    Suresha M1, Ravikumar M

    2

    Department of Computer Science, Kuvempu University

    Karnataka, India [email protected] [email protected]

    Abstract In this paper, we have proposed a method for classification of vegetables based on the extraction of texture

    properties. The work has been carried out using watershed for

    segmentation. The vegetable texture features like red component,

    green component, skewness, kurtosis, variance, and energy are

    extracted. The method has been employed to normalize vegetable

    images and hence eliminating the effects of orientation using

    image resize technique with proper scaling. Classification is done

    using Mean around features, Gray level Co-occurrence matrix

    (GLCM) features and combined (Mean around-GLCM) features.

    Decision trees classifier is used for classification of vegetables in

    to eight classes. Splitting rules for growing a decision tree

    included in this work are Gini diversity index(gdi), Twoing rule,

    and Entropy. Results obtained from the proposed method are

    well accepted and solutions are good agreement with the experts.

    Proposed approach is experimented on vegetable data set using

    cross validation and found good success rate.

    Keywords Decision Tree Classifier, GLCM, Mean-around Features, Texture Features, Vegetables Classification.

    I. INTRODUCTION

    Recognizing different kinds of vegetables and fruits is a

    recurrent task in supermarkets and food processing industries.

    Often, one needs to deal with complex classification problems.

    In such scenarios, using just one feature set to capture the

    classes seperability might not be enough and more features may become necessary to improve the accuracy of

    classification. Besides it has the drawback of increasing the

    dimensionality of the data which might require more training

    samples but increases the accuracy. This paper presents

    method for classification of vegetables with Mean around

    features, GLCM features and combined features.

    There are eight types of vegetables considered for this work,

    namely Cabbage, Beetroot, Capsicum, Carrot, Chillies,

    Cucumber, Bittermelon and Onion. In the proposed method

    vegetables are classified based on the Mean Around features

    like, red color component, green color component, Kurtosis, Variance and Gray level co-occurrence matrix (GLCM)

    features like Contrast, Correlation, Energy and Homogeneity.

    In the rest of this paper, we describe some related works

    briefly in Section II. We present segmentation methodology in

    Section III, which includes segmentation using watershed

    segmentation. Feature extraction and classification of

    arecanuts using decision trees, presented in section IV and V

    respectively and we put experimental results and analysis in

    Section VI. Finally, conclusions are drawn in section VII. The block diagram of overall process is given in Fig. 1.

    Fig. 1 Block diagram of overall process

    II. LITERATURE SURVEY

    In [18], a new model of automated grading system for oil

    palm fruit is developed using the RGB color model and

    artificial fuzzy logic. The mean color intensity based on RGB

    color model is determined and achieved 86.67% accuracy in

    overall categories. In [14], a methodology for recognition and

    classification of fruits in fruits salad image samples. The

    samples of different fruits like Apple, Chikku, Banana,

    Orange and Pineapple are considered. Each sample of fruits are sliced into pieces and placed on the tray. The RGB color

    features extracted from the images from the knowledge base.

    A K mean classifier is proposed and has the classification

    efficiency of around 98%. In [12], an efficient fusion of color

    and texture features for fruit recognition. The recognition is

    done by the minimum distance classifier based upon the

    statistical and co-occurrence features derived from the

    Wavelet transformed sub-bands. Experimental results on a

    Input Image

    Segmentation [Watershed Segmentation]

    Feature Extraction

    Decision Tree Classifier for Classification

    Labelled Image

    Mean around

    Features GLCM

    Features

    Combined (Mean

    around-GLCM)

    Features

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    43

    database of about 2635 fruits from 15 different classes

    confirm the effectiveness of the proposed approach. In Woo [16], a new Fruit recognition system has been proposed,

    which combines three features analysis methods: color-based,

    shape based and size-based in order to increase accuracy of

    recognition. Proposed method classifies and recognizes fruit

    images based on obtained features values by using nearest

    neighbors classification. Consequently, system shows the fruit

    name and a short description to user. Proposed fruit

    recognition system analyzes, classifies and identifies fruits

    successfully up to 90% accuracy. In [19], a novel method for

    realizing the color classifying in yarn-dyed fabric is proposed.

    The color image of yarn-dyed fabric was obtained by a flat scanner, and then it is converted from RGB color space to

    Lab color space. FCM was selected as the Color Cluster

    method. The color yarn number is detected based on the

    validity for FCM clusters. In [17], a method to identify human

    skin region in an image on color classification. The method

    classifies the colors of all pixels in the image into several

    classes through K-means algorithm and segments the image

    into several parts according to the color class that each pixel

    belongs. Find the class whose feature vector has the minimum

    distance to the skin color feature vector previously defined in

    the color space. In[3], an image classification technique that

    uses the Bayes decision rule for minimum cost to classify pixels into skin color and non-skin color. Color statistics are

    collected from YCbCr color space. In [8], a method to classify

    more than ten categories of seed defects by using color,

    texture features and support vector machine (SVM) type

    classifier. In the image classification part, color histograms in

    RGB and HSV color space together with texture based on

    Grey level co-occurrence matrix (GLCM) and Local binary

    pattern (LBP) is adopted as features. The proposed systems

    were evaluated from more than 10,000 sample images. The

    obtained accuracies are 95.6% for normal seed type and 80.6%

    for group of defect seed types.

    III. SEGMENTATION

    Image segmentation is a process that partitions an image

    into its constituent regions or objects. Effective segmentation

    of complex images is one of the most difficult tasks in image

    processing. Various image segmentation algorithms have been

    proposed to achieve efficient and accurate results. Among

    these algorithms, watershed segmentation is a particularly

    attractive method. The major idea of watershed segmentation

    is based on the concept of topographic representation of image

    intensity. Meanwhile, Watershed segmentation also embodies

    other principal image segmentation methods including

    discontinuity detection, thresholding and region processing. Because of these factors, watershed segmentation displays

    more effectiveness and stableness than other segmentation

    algorithms [11]. Watershed segmentation is an effective

    method for gray level vegetable image segmentation. To apply

    watershed segmentation to binary images, we need to pre-

    process the vegetable binary images with distance transform

    to convert it to gray level images which are suitable for

    watershed segmentation. The common Distance Transforms

    (DTs) include Euclidean, City block and Chessboard. Different DTs produce very different watershed segmentation

    results for the vegetable binary images. For vegetable images

    containing components of different shapes, we find that the

    Chessboard DT can achieve better watershed segmentation

    results than Euclidean DT and City block DT.

    IV. FEATURE EXTRACTION

    In the feature extraction process, we determine distribution

    patterns of data based on the red color component, green color

    component, Skewness, Kurtosis and Variance around the

    mean of sample data. Further, we determine GLCM features

    such as Contrast, Correlation, Energy and Homogeneity based on intensity of pixels. Further confusion matrix is determined

    for the above Mean around features, GLCM features and

    combined (Mean around- GLCM) features.

    A. Mean-around Features

    Color components and texture features are the prominent

    features for classification of vegetables. In the distribution

    features there are five features of vegetables are considered,

    these features are Red component, Green component, Kurtosis,

    Variance and Skewness.

    The average value of the red component ( R ) of an RGB

    image is obtained by using equation (1).

    ),(1

    1 1

    jifNXM

    M

    i

    N

    jRR

    (1)

    Where fR is the red component of an RGB image and M & N

    are the rows and columns of an image.

    The average value of the green component ( G ) of an RGB

    image is obtained by using equation (2).

    ),(1

    1 1

    jifNXM

    M

    i

    N

    jGG

    (2)

    Where fG is the green component of an RGB image and M &

    N are the rows and columns of an image.

    Skewness is a measure of the asymmetry of the data around

    the sample mean. If skewness is negative, the data are spread

    out more to the left of the mean than to the right. If skewness

    is positive, the data are spread out more to the right. The

    skewness of the normal distribution (or any perfectly

    symmetric distribution) is zero. The skewness of a distribution

    is defined as in equation (3)

    3

    3)(

    xESSkewness (3)

    Where x is the pixel value, is the mean of x, is the standard deviation of x, and E(x - ) represents the expected value of the quantity (x ).

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    44

    Kurtosis is a measure of distribution outlier-prone. The

    kurtosis of the normal distribution is 3. Distributions that are more outlier-prone than the normal distribution have kurtosis

    greater than 3; distributions that are less outlier-prone have

    kurtosis less than 3. The kurtosis of a distribution is defined as

    in equation (4)

    4

    4)(

    xEKKurtosis (4)

    Where x is the pixel value, is the mean of x, is the standard deviation of x, and E(x - ). represents the expected value of the quantity )( x .

    Variance returns the variations in the pixels of an image. The

    variance of a distribution is defined in equation (5)

    2),(

    1 1),( )( ji

    M

    i

    N

    jji ffVVariance

    (5)

    Where f is the gray level image and f is the average value

    of the pixels in a gray level image of vegetable.

    B. GLCM Features

    Texture feature uses the contents of GLCM to measure the

    variation in intensity at a pixel of interest. [6] first proposed in

    1973, they characterize texture using a variety of quantities

    derived from second order image statistics. Co-occurrence

    texture features are extracted from an image in two steps. First,

    the pairwise spatial co-occurrences of pixels separated by a particular angle and distance are tabulated using GLCM.

    Second, the GLCM is used to compute a set of scalar

    quantities that characterize different aspects of the underlying

    texture. The GLCM is a tabulation of how often different

    combinations of gray levels co-occur in an image or image

    section [6].

    TABLE I

    GLCM FEATURES

    Contrast ji

    jipji,

    2 ),(||

    Correlation

    ji ji

    ji jipji

    ,

    ),()()(

    Energy ji

    jip,

    2),(

    Homogeneity ji ji

    jip

    , ||1

    ),(

    The GLCM is N x N square matrix, where N is the number

    of different gray levels in an image. An element p(i, j, d, ) of a GLCM of an image represents the relative frequency, where

    i is the gray level of the pixel p at allocation (x,y) , and j is the

    gray level of a pixel located at a distance d from p in the

    orientation . While GLCMs provide a quantitative description of a spatial pattern, they are too unwieldy for

    practical image analysis. [6] proposed a set of scalar quantities

    for summarizing the information contained in a GLCM. He

    originally proposed a total of fourteen features. However, only

    subsets of these are used [9]. The four derived features used in

    our work are given in TABLE II.

    V. DECISION TREE CLASSIFIER

    Decision trees are easy to interpret, computationally

    inexpensive, and capable of coping with noisy data. Therefore,

    the techniques have been widely used in various applications,

    such as pattern recognition [13], credit and loan evaluation [13], [7], fraud and network intrusion detection [15], [4], and

    medical diagnosis and healthcare management [10]. Decision

    tree learning used in statistics, data mining and machine

    learning, uses a decision tree as a predictive model which

    maps observations about an item to conclusions about the

    item's target value. More descriptive names for such tree

    models are classification trees or regression trees. The

    majority of decision trees deal with the classification problem,

    which is also the main goal of this paper. In this context, the

    technique is also referred to as classification trees. In this

    paper, we deal with binary trees, where each split produces

    exactly two child nodes. Four splitting rules that are widely available for growing a

    decision tree include: gini, twoing, and entropy. Each of the

    splitting rules attempts to segregate data using different

    approaches. The gini index is defined as:

    i

    ii pptGini )1()( (6)

    Where pi is the relative frequency (determined by dividing the total number of observations of the class by the total

    number of observations) of class i at node t, and node t

    represents any node (parent or child) at which a given split of

    the data is performed [1]. The gini index is a measure of

    impurity for a given node that is at a maximum when all

    observations are equally distributed among all classes. In

    general terms, the gini splitting rule attempts to find the

    largest homogeneous category within the dataset and isolate it

    from the remaining data. Subsequent nodes are then

    segregated in the same manner until further divisions are not

    possible. An alternative measure of node impurity is the

    towing index:

    i

    RLRL tiptiP

    PPtTwoing 2| )))|()|((|(

    4)( (7)

    Where L and R refer to the left and right sides of a given

    split respectively, and p(i|t) is the relative frequency of class i

    at node t [2]. Twoing attempts to segregate data more evenly

    than the gini rule, separating whole groups of data and

    identifying groups that make up 50 percent of the remaining

    data at each successive node. Entropy, often referred to as the

    information rule, is a measure of homogeneity of a node and is defined as:

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    45

    i

    ii pptEntropy log)( (8)

    Where pi is the relative frequency of class i at node t [1].

    The entropy rule attempts to identify splits where as many

    groups as possible are divided as precisely as possible and

    forms groups by minimizing the within group diversity [5].

    This rule can be interpreted as the expected value of the

    minimized negative log-likelihood of a given split result and tends to identify rare classes more accurately than the

    previous rules.

    a) Color Image b) Labeled Image c) Grayscale Image

    Fig. 2 Sample Experimental Results

    VI. RESULTS AND DISCUSSION

    In this work we have created our own vegetable database.

    We collected vegetable images from World Wide Web in

    addition to this some images were taken in and around our

    place using Canon Digital camera with natural day light. All

    the Images were taken to approximately fill the camera field

    of view in natural day light with white background. Images

    were resized into 300 X 300 pixel resolution to speed up

    computation. We considered most commonly available

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    vegetables. The vegetable database contains 8 classes of total

    582 vegetable images. Fig. 2 shows sample experimental results.

    We have used Decision Trees for classification of

    Vegetables. The feautre set contains average red component,

    average green component, skewness, kurtosis, variance and

    energy. The confusion matrix shows the accuracy of the Decision Tree. When we evaluate the training samples, we got

    good classification accuracy for combined features.

    TABLE III

    CONFUSION MATRIX FOR GLCM FEATURES USING ENTROPY

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 93 1 1 1 0 2 1 5 104 89.42

    Beetroot 5 48 0 1 0 2 1 5 62 77.41

    Capsicum 5 0 37 0 0 0 0 2 44 84.09

    Carrot 2 2 1 60 0 2 0 0 67 89.55

    Chillies 1 2 0 3 17 1 0 1 25 68.00

    Cucumber 2 0 0 3 0 36 1 3 45 80.00

    Bittermelon 1 2 1 3 0 0 46 1 54 85.18

    Onion 2 1 0 1 0 1 0 176 181 97.23

    Total 582 88.14

    TABLE III

    CONFUSION MATRIX FOR MEAN AROUND FEATURES USING ENTROPY

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 99 0 2 1 0 0 2 0 104 95.19

    Beetroot 1 57 0 1 0 0 0 3 62 91.93

    Capsicum 0 0 42 0 1 0 0 1 44 95.45

    Carrot 0 2 3 59 0 0 0 3 67 88.05

    Chillies 0 0 0 1 23 0 0 1 25 92.00

    Cucumber 1 0 0 0 0 44 0 0 45 97.77

    Bittermelon 4 2 1 0 0 2 45 0 54 83.33

    Onion 0 0 1 0 1 1 0 178 181 98.34

    Total 582 93.98

    TABLE IV

    CONFUSION MATRIX FOR COMBINED FEATURES USING ENTROPY

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 103 0 0 0 0 0 1 0 104 99.03

    Beetroot 0 57 1 1 0 0 1 2 62 91.93

    Capsicum 0 0 38 1 0 0 2 3 44 86.36

    Carrot 1 0 1 63 1 0 1 0 67 94.02

    Chillies 0 0 0 1 24 0 0 0 25 96.00

    46

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    47

    Mis

    clas

    sifi

    cati

    on r

    ate

    Cucumber 1 0 0 0 0 43 1 0 45 95.55

    Bittermelon 1 0 0 1 0 0 49 3 54 90.74

    Onion 0 0 1 0 0 0 0 180 181 99.44

    Total 582 95.70

    Tree size (Number of Terminal Nodes)

    Fig. 3: Estimated cost for each tree using cross validation for splitting rule entropy

    TABLE V

    CONFUSION MATRIX FOR GLCM FEATURES USING GDI

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 96 1 1 1 1 1 1 2 104 92.30

    Beetroot 5 50 1 2 0 0 2 2 62 80.64

    Capsicum 4 0 35 0 0 2 0 3 44 79.54

    Carrot 2 1 1 54 1 4 2 2 67 80.59

    Chillies 0 0 2 2 19 1 0 1 25 76.00

    Cucumber 3 1 0 2 0 35 0 4 45 77.77

    Bittermelon 2 3 0 2 0 2 44 1 54 81.48

    Onion 6 0 1 0 0 0 0 174 181 96.13

    Total 582 87.11

    TABLE VI

    CONFUSION MATRIX FOR MEAN AROUND FEATURES USING GDI

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 101 1 0 0 0 0 1 1 104 97.11

    Beetroot 2 49 1 0 1 1 2 6 62 79.03

    Capsicum 2 0 38 0 0 1 2 1 44 86.36

    Carrot 0 0 0 64 0 0 3 0 67 95.52

    Chillies 1 0 2 1 18 0 0 3 25 72.00

    Cucumber 1 0 0 0 0 43 1 0 45 95.55

    Bittermelon 1 0 0 0 0 0 52 1 54 96.29

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    48

    Mis

    clas

    sifi

    cati

    on r

    ate

    Onion 0 1 3 0 0 0 0 177 181 97.79

    Total 582 93.12

    TABLE VII

    CONFUSION MATRIX FOR COMBINED FEATURES USING GDI

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 98 1 1 1 0 1 0 2 104 94.23

    Beetroot 0 59 1 0 0 0 1 1 62 95.16

    Capsicum 1 0 38 0 0 0 3 2 44 86.36

    Carrot 0 1 2 62 0 0 2 0 67 92.53

    Chillies 0 1 1 2 18 0 0 3 25 72.00

    Cucumber 1 0 0 0 0 43 1 0 45 95.55

    Bittermelon 0 0 0 0 0 1 53 0 54 98.14

    Onion 1 2 1 0 0 0 0 177 181 97.79

    Total 582 94.15

    Tree size (Number of Terminal Nodes)

    Fig. 4: Estimated cost for each tree using cross validation for splitting rule gdi

    TABLE VIII

    CONFUSION MATRIX FOR GLCM FEATURES USING TWOING RULE

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 96 2 0 0 0 0 2 4 104 92.30

    Beetroot 2 43 0 3 1 4 5 4 62 69.35

    Capsicum 1 0 30 1 1 4 1 6 44 68.18

    Carrot 1 1 1 55 0 5 1 3 67 82.08

    Chillies 1 0 1 1 20 1 0 1 25 80.00

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    49

    Cucumber 4 1 0 3 0 37 0 0 45 82.22

    Bittermelon 0 2 1 1 0 0 45 5 54 83.33

    Onion 9 1 2 1 1 1 1 165 181 91.16

    Total 582 84.36

    TABLE IX

    CONFUSION MATRIX FOR MEAN AROUND USING TWOING RULE

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 103 0 0 0 0 0 1 0 104 99.03

    Beetroot 4 54 0 0 1 0 0 3 62 87.09

    Capsicum 2 0 39 0 0 0 0 3 44 72.22

    Carrot 2 2 0 61 0 1 1 0 67 91.04

    Chillies 1 0 0 3 20 0 0 1 25 80.00

    Cucumber 0 0 0 1 0 44 0 0 45 97.77

    Bittermelon 2 0 0 0 0 0 52 0 54 96.29

    Onion 4 0 1 1 0 1 0 174 181 96.13

    Total 582 93.98

    TABLE X

    CONFUSION MATRIX FOR COMBINED FEATURES USING TWOING RULE

    Cabbage Beetroot Capsicum Carrot Chillies Cucumber Bittermelon Onion Total Succss Rate in %

    Cabbage 100 0 3 0 0 0 1 0 104 96.15

    Beetroot 0 59 1 0 1 1 0 0 62 95.16

    Capsicum 0 0 40 1 0 2 0 1 44 90.90

    Carrot 1 3 1 59 0 2 1 0 67 88.05

    Chillies 1 0 1 2 20 0 0 1 25 80.00

    Cucumber 2 0 0 0 0 43 0 0 45 95.55

    Bittermelon 1 0 1 0 0 0 52 0 54 96.29

    Onion 4 0 1 0 0 0 1 175 181 96.68

    Total 582 94.15

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    50

    Mis

    clas

    sifi

    cati

    on r

    ate

    Tree size (Number of Terminal Nodes)

    Fig. 5: Estimated cost for each tree using cross validation for splitting rule Twoing

    VII. CONCLUSIONS

    In this paper, we have used watershed segmentation to

    segment the vegetable images on the dataset. In the segmented region Mean Around features and GLCM features are

    extracted. Testing is conducted on Mean Around features,

    GLCM features and combined features using Decision trees

    classifier with tree splitting rules gdi, twoing rule, and

    entropy. Testing is conducted by using cross validation

    method and found the following observations:

    Splitting rule entropy for growing decision tree:

    The GLCM features have given success rate of 88.14%.

    Mean Around features have given success rate of 93.98%.

    Mean Around-GLCM features have given success rate 95.70%.

    Splitting rule gdi for growing decision tree:

    The GLCM features have given success rate of 87.11%.

    Mean Around features have given success rate of 93.12%.

    Mean Around-GLCM features have given success rate 94.15%.

    Splitting rule twoing for growing decision tree:

    The GLCM features have given success rate of 84.36%.

    Mean Around features have given success rate of 93.98%.

    Mean Around-GLCM features have given success rate of 94.15%.

    Experimental results revels that combination of Mean

    Around features and GLCM features will increase the

    accuracy of classification. This method can be extended to

    other objects such as classification of flowers, fruits, seeds,

    and vegetables etc. where human intervention is in need for

    classification.

    ACKNOWLEDGMENT

    Authors would like to thank Sandeep Kumar K.S and Shiva

    Kumar G for their help.

    REFERENCES

    [1] Apte C. S. and Weiss, Data mining with decision tress and decision rules, Future Generation Computer Systems, 13:197210, 1997.

    [2] Breiman L., Some properties of splitting criteria, Machine Learning, 24:4147, 1996.

    [3] Chai D., Bouzerdoum A., A Bayesian approach to skin color classification in YCbCr color space, Proceedings TENCON, Vol.2, 421 - 424, 2000.

    [4] D. Zhu, G. Premkumar, X. Zhang, C. H. Chu. Data mining for network intrusion detection: a comparison of alternative methods, Decision Sciences, 32 (4), 635 660, 2001.

    [5] Death G. K., Fabricius, Classification and regression trees: a powerful yet simple technique for ecological data analysis, Ecology, 81(11):31783192, 2000.

    [6] Harlick R. M., Shanmugam K. and Dinstein I., Textural Features for image classification, IEEE Trans. on System, man and Cybernetics,

    610 621, 1973.

    [7] J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.

    [8] Kiratiratanapruk K., Sinthupinyo W, Color and texture for corn seed classification by machine vision, International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS),

    15, 2011.

    [9] Newsam S. D. and Kamath C., Retrieval using texture features in high resolution multi-spectral satellite imagery, In SPIE Conference. on Data Mining and Knowledge Discovery, 2004.

    [10] Olivia R. L. Sheng, Chih P. Wei, Paul J. H. Hu and Namsik C., Automated learning of patient image retrieval knowledge: neural Networks versus inductive decision trees, Decision Support Systems,

    30 (2), 105124, 2000.

    [11] Rafael C. G. and Richard E. Woods and Steven L. Eddins, Digital Image Processing using MATLAB, PPH, 2009.

  • International Journal of Image Processing and Visual Communication

    ISSN (Online) 2319-1724 : Volume 1 , Issue 2 , October 2012

    51

    [12] S. Arivazhagan, R. Newlin Shebiah, S. Selva Nidhyanandhan, L. Ganesan, Fruit Recognition using Color and Texture Features, Journal of Emerging Trends in Computing and Information Sciences, Vol. 1, No. 2, 90 94, Oct 2010.

    [13] S. Piramuthu, On learning to predict web traffic, Decision Support Systems, 35 (2), 213 229, 2003.

    [14] Vishwanath B. C., S. A. Madival, Sharanbasava Madole, Recognition of Fruits in Fruits Salad Based on Color and Texture Features,

    International Journal of Engineering Research & Technology, Vol. 1 Issue 7, 1- 6, September 2012.

    [15] W. Lee, S. J. Stolfo, A framework for constructing features and models for intrusion detection systems, ACM Tran. on Information and System Security. 3 (4), 227 261, 2000.

    [16] Woo Chaw Seng, Seyed Hadi Mirisaee, A new method for fruits recognition system, International Conference on Electrical Engineering and Informatics, Vol 1, 130 - 134, 2009.

    [17] Xiaoying Fang, Wenquan Gu, Chang Huang, A method of skin color identification based on color classification, International Conference

    on Computer Science and Network Technology (ICCSNT), Vol. 4, 2355

    2358, 2011.

    [18] Z. May, M. H. Amaran, Automated Oil Palm Fruit Grading System using Artificial Intelligence, International Journal of Video & Image Processing and Network Security, 30-35, 2011.

    [19] Zhang Ronghua, Chen Hongwu, Zhang Xiaoting, Pan Ruru, Liu Jihong, Unsupervised Color Classification for Yarn-dyed Fabric Based on FCM Algorithm, International Conference on Artificial Intelligence and Computational Intelligence (AICI),Vol. 1, 497501, 2010.