A Transplantable System for Weed Classification by ... · A Transplantable System for Weed...

6
A Transplantable System for Weed Classification by Agricultural Robotics David Hall 1 , Feras Dayoub 2 , Tristan Perez 1 and Chris McCool 1 Abstract— This work presents a rapidly deployable system for automated precision weeding with minimal human labeling time. This overcomes a limiting factor in robotic precision weeding related to the use of vision-based classification systems trained for species that may not be relevant to specific farms. We present a novel approach to overcome this problem by employing unsupervised weed scouting, weed-group labeling, and finally, weed classification that is trained on the labeled scouting data. This work demonstrates a novel labeling ap- proach designed to maximize labeling accuracy whilst needing to label as few images as possible. The labeling approach is able to provide the best classification results of any of the examined exemplar-based labeling approaches whilst needing to label over seven times fewer images than full data labeling. I. INTRODUCTION In recent years, we have seen an ever increasing rise of herbicide resistance in many weed species [1]. In order to deal with it, integrated weed management has become an increasingly crucial part of modern farming practice [2]. It has become more important than ever that weeds be treated in the manner which is most appropriate for that individual weed so as to reduce herbicide resistance and increase weed management effectiveness. Two key elements to an effective integrated weed management strategy are having full knowledge of the species present in the field and in what quantities they appear [2] [3], a process we refer to as weed scouting, and treating weeds species in the most appropriate manner for that individual species, a process we refer to as weed destruction. In the past, scouting was done through a manual inspection and sampling of the field to get an impression of weed species and distributions. This can be, however, a laborious process taking up much of a farmer’s time. Several attempts have been made to automate the process of mapping and/or eliminating weeds [4], however some key questions have yet to be answered before widespread adoption of such approaches can become a reality. Firstly, how can we autonomously scout an area and provide farmers with species distribution information needed for integrated weed management without needing to know the species a priori? Secondly, after providing such in- formation, can we train a weed classification system to henceforth identify the weeds identified during scouting, so 1 The authors are with the School of Electrical Engineering and Computer Science, Queensland University of Technology (QUT), Brisbane, Australia email: {d20.hall, tristan.perez, c.mccool}@qut.edu.au 2 The author is with the ARC Centre of Excellence for Robotic Vi- sion, Queensland University of Technology (QUT), Brisbane, Australia. http://www.roboticvision.org/ email: [email protected] Fig. 1: AgBotII agricultural robotics platform that integrated weed management strategies can be applied autonomously? In this paper, we present preliminary work on a trans- plantable, rapidly deployable, weed classification system which can scout a field, present minimal data for labeling by human workers, and train a classification system designed for the weeds found in the scouted field without any prior knowledge of weed species before deployment. This is an expansion upon our previous work on unsupervised weed scouting shown in [5]. This work develops a labeling strategy to gain the highest classification accuracy with low data labeling time. We demonstrate different techniques aimed at minimizing human decision time while maximizing clas- sification accuracy based upon initial unsupervised scouting information using data collected from the AgBotII platform shown in Fig. 1. We demonstrate our own label-refinement technique designed to increase classification accuracy. We are able to show how our approach to data labeling can greatly reduce the time needed for labeling data while minimizing reduction of classification accuracy caused by inaccurate initial clustering. II. LITERATURE REVIEW Automated weed classification within agricultural robotics has been researched and investigated thoroughly over recent years with the goal of allowing fully automated robots to take over the laborious and repetitive tasks of weed scouting and weed destruction [4]. There have been two major approaches, weed vs crop classification and species-wise classification. In weed vs crop classification,the focus is on successfully identifying the desired crop and treats all other detected plants as weeds [6] [7]. This is one of the most effective and easily implemented systems available as it deals with a binary classification problem. However, this approach is limited as the system would need to be completely retrained

Transcript of A Transplantable System for Weed Classification by ... · A Transplantable System for Weed...

Page 1: A Transplantable System for Weed Classification by ... · A Transplantable System for Weed Classication by Agricultural Robotics David Hall 1, Feras Dayoub 2, Tristan Perez and Chris

A Transplantable System for Weed Classification by AgriculturalRobotics

David Hall1, Feras Dayoub2, Tristan Perez1 and Chris McCool1

Abstract— This work presents a rapidly deployable systemfor automated precision weeding with minimal human labelingtime. This overcomes a limiting factor in robotic precisionweeding related to the use of vision-based classification systemstrained for species that may not be relevant to specific farms.We present a novel approach to overcome this problem byemploying unsupervised weed scouting, weed-group labeling,and finally, weed classification that is trained on the labeledscouting data. This work demonstrates a novel labeling ap-proach designed to maximize labeling accuracy whilst needingto label as few images as possible. The labeling approach is ableto provide the best classification results of any of the examinedexemplar-based labeling approaches whilst needing to label overseven times fewer images than full data labeling.

I. INTRODUCTION

In recent years, we have seen an ever increasing rise ofherbicide resistance in many weed species [1]. In order todeal with it, integrated weed management has become anincreasingly crucial part of modern farming practice [2].It has become more important than ever that weeds betreated in the manner which is most appropriate for thatindividual weed so as to reduce herbicide resistance andincrease weed management effectiveness. Two key elementsto an effective integrated weed management strategy arehaving full knowledge of the species present in the field andin what quantities they appear [2] [3], a process we referto as weed scouting, and treating weeds species in the mostappropriate manner for that individual species, a process werefer to as weed destruction.

In the past, scouting was done through a manual inspectionand sampling of the field to get an impression of weedspecies and distributions. This can be, however, a laboriousprocess taking up much of a farmer’s time. Several attemptshave been made to automate the process of mapping and/oreliminating weeds [4], however some key questions haveyet to be answered before widespread adoption of suchapproaches can become a reality.

Firstly, how can we autonomously scout an area andprovide farmers with species distribution information neededfor integrated weed management without needing to knowthe species a priori? Secondly, after providing such in-formation, can we train a weed classification system tohenceforth identify the weeds identified during scouting, so

1The authors are with the School of Electrical Engineering and ComputerScience, Queensland University of Technology (QUT), Brisbane, Australiaemail: {d20.hall, tristan.perez, c.mccool}@qut.edu.au

2The author is with the ARC Centre of Excellence for Robotic Vi-sion, Queensland University of Technology (QUT), Brisbane, Australia.http://www.roboticvision.org/ email: [email protected]

Fig. 1: AgBotII agricultural robotics platform

that integrated weed management strategies can be appliedautonomously?

In this paper, we present preliminary work on a trans-plantable, rapidly deployable, weed classification systemwhich can scout a field, present minimal data for labelingby human workers, and train a classification system designedfor the weeds found in the scouted field without any priorknowledge of weed species before deployment. This is anexpansion upon our previous work on unsupervised weedscouting shown in [5]. This work develops a labeling strategyto gain the highest classification accuracy with low datalabeling time. We demonstrate different techniques aimedat minimizing human decision time while maximizing clas-sification accuracy based upon initial unsupervised scoutinginformation using data collected from the AgBotII platformshown in Fig. 1. We demonstrate our own label-refinementtechnique designed to increase classification accuracy. Weare able to show how our approach to data labeling cangreatly reduce the time needed for labeling data whileminimizing reduction of classification accuracy caused byinaccurate initial clustering.

II. LITERATURE REVIEW

Automated weed classification within agricultural roboticshas been researched and investigated thoroughly over recentyears with the goal of allowing fully automated robots to takeover the laborious and repetitive tasks of weed scouting andweed destruction [4]. There have been two major approaches,weed vs crop classification and species-wise classification.

In weed vs crop classification,the focus is on successfullyidentifying the desired crop and treats all other detectedplants as weeds [6] [7]. This is one of the most effectiveand easily implemented systems available as it deals witha binary classification problem. However, this approach islimited as the system would need to be completely retrained

Page 2: A Transplantable System for Weed Classification by ... · A Transplantable System for Weed Classication by Agricultural Robotics David Hall 1, Feras Dayoub 2, Tristan Perez and Chris

for any new crop and does not allow for species specificweed treatment.

One of the best practical examples of species-wise classifi-cation being used for weed destruction is shown in the workby Gerhards and Oebel in [8] which allowed for herbicidesavings of up to 81% as well as a weeding efficacy of 85-95%. Other systems such as those shown in [9] [10] [11] alsodemonstrate how species-wise classification can be used forsmall sets of species. The main drawback of such approachesis similar to the problem of weed vs crop classification.Information as to which species are present is needed a prioriwhich gives the systems limited transplantability unless thefields contain exactly the species that they have been trainedto identify. The process for retraining such classificationsystems is also very laborious requiring data collection orat least manual image labeling to be performed by humanworkers.

As a new approach to this problem within the context ofweed scouting, we recently developed an unsupervised weedscouting system which sought to group visually similar plantstogether rather than by classifying them outright with the ideathat the farmer can simply identify the clusters and examinewhere plants of this type were located within their field [5].While being an important first step, it did not address howscouting information could be presented to, and utilized by,the farmer. These are issues which shall be addressed withinthis work.

Within the field of clustering, there have been a few keyapproaches for presenting cluster summaries to a user usingexemplars which are indicative of each cluster. A commontechnique is to simply produce the sample closest to thecentroid/mean of the cluster as the representative exemplarsuch as was done for spectral clustering results in [12].Sometimes this can be expanded upon to show the top ‘n’samples closest to the mean to give a better impression ofthe data shown within. In 2007, the exemplar-based affinitypropagation (AP) algorithm was developed with the expresspurpose of clustering data by finding an unknown number ofexemplars which are said to represent the entire dataset [13].When this technique is used for clustering, the exemplarsused for each AP cluster are simply the exemplars foundby the algorithm. Since AP was first introduced, a rangeof variations for AP have been created, mostly focused onincreasing the speed of AP such as is done in [14] . Whatthese exemplar approaches have in common, however, is thatthey all assume a near-perfect clustering result and showwhat should be the majority/central exemplars of the clusterand rarely give an overall impression of the range of datafound within.

As shown above, current state-of-the-art weed classifi-cation systems are inflexible and ill-suited for widespreaddeployment at the current time. We build on the idea ofunsupervised weed scouting, training weed classifiers usingdata collected and grouped through an initial unsupervisedscouting. We shall evaluate how to best present the groupeddata to a user for labeling and address how it can best beachieved with the least amount of human input for the highest

Fig. 2: Outline of the proposed three stage process fortransplantable precision weeding. Users request a generalweed scouting to occur. This is followed by some simplifieddata labeling which is used to train a classifier. Once theclassifier is trained autonomous precision weeding can beordered continuously until scouting is next desired.

classification accuracy.

III. METHODOLOGY

This work demonstrates a novel, transplantable approachto automated precision weeding. Building upon our previouswork outlined in [5], this approach, as summarized in Fig. 2is split into three main parts:

• Unsupervised weed scouting where a robot scouts afield, detecting plants and clustering them into mean-ingful groups.

• Data labeling where a summary of the unsupervisedweed scouting groups is labeled by the user. Comparedto regular data labeling, this presents the user with farfewer images they need to label.

• Autonomous precision weeding which uses a classifiertrained from the labeled data to classify and eliminateweeds in an appropriate manner.

The remainder of this section will describe these partsin some detail, focusing on the data labeling stage whichallows us to obfuscate the need to label each image whilestill obtaining similar classification performance.

A. Unsupervised Weed Scouting

Unsupervised weed scouting is the first stage in ourapproach, allowing us to summarize the weeds present inthe field without need for any prior information about thefield. To achieve this, we follow the best approach outlinedin our previous work [5] which we shall briefly summarizehere.

First, plants are detected using a multivariate Gaus-sian trained with a feature vector comprising of the non-brightness channels from the HSV, Lab and Luv colourspaces to form the following vector [H,S,u,a,v,b]. This Gaus-sian is trained on plant images which are not related to ourclustering or classification datasets and a threshold is appliedto perform plant segmentation. After noise is removed wefind close contours which are merged to form a single weedsegmentation mask region which is later used for featureextraction.

In this work, we extract bottleneck deep convolution neuralnetwork (DCNN) features, which we shall refer to as bot-tleneck features, from the segmented plant images attainedafter plant detection. Bottleneck features are extracted from

Page 3: A Transplantable System for Weed Classification by ... · A Transplantable System for Weed Classication by Agricultural Robotics David Hall 1, Feras Dayoub 2, Tristan Perez and Chris

Fig. 3: A simplified visual representation of how exemplarscan be extracted from a given cluster: (a) cluster distributionwere each different shape and colour represents a differenttype of ground truth data (e.g. black dots may be cottonand red triangles sowthistle); (b) example of mean-basedexemplars where the 5 closest samples to the mean (shownas a red x) are chosen as exemplars (coloured blue); (c)an example of AP-based exemplars where exemplars areshown in blue with samples grouped with said exemplarsrepresented by the lines joining samples to exemplars.

a modified GoogLeNet DCNN [15] with a low-dimensionalbottleneck layer added before the classification layer. Thisnetwork was fine-tuned on data from a training subset ofleaf images from the PlantCLEF dataset [16].

Once all plants in an area have been detected and hadfeatures extracted, the clustering process within unsupervisedweed scouting is initiated. The clustering stage of unsuper-vised weed scouting utilizes locked agglomerative hierarchi-cal clustering which is the technique found to produce thebest clustering results within [5]. This is a slightly modifiedversion of agglomerative hierarchical clustering [17] whereclusters are iteratively merged together based upon a relativedistance metric until a given ending criterion. The lockingmodification takes advantage of plants that are detected mul-tiple times as a robot traversed over them. For hierarchicalclustering, multiple instances of the same plant are lockedtogether at initialization to improve the model representationof each plant for calculating the initial distance metric andstopping criterion. Full details regarding the distance metricsand stopping criterion used here are outlined further in [5].Other clustering approaches could be used in this stagebut for consistency, we only consider this one clusteringalgorithm with future combinations of clustering algorithmsand labeling methods being reserved for future work.

Once the scouting process is complete, the end resultis that all plants detected will have been grouped into anunspecified number of visually similar groups. Whereasour previous work used this clustering result to give animmediate map indicating where the cluster groups werelocated within the field, the focus of this work is to label theclusters and utilize the result to train a classification systemthat can perform precision weeding without further humaninteraction.

B. Data Labeling

In the context of this work, data labeling is the processfrom which we take the plant clustering results achievedthrough unsupervised plant scouting, and give them mean-

ingful real-world labels. This should label all samples de-tected in the scouting process by manually labeling only aselect few representative samples which we shall refer toas exemplars. We evaluate two types of exemplar in thiswork, these being mean-based and AP-based exemplars. Weinvestigate some standard labeling strategies for utilizingthese exemplar methods, as well as our own label refinementstrategy utilizing AP-based exemplars.

1) Mean-based Exemplars: Mean-based exemplars areone of the simplest methods for extracting exemplars withwhich to summarize clusters. Here we take the n sampleswhich have the smallest euclidean distance from the meanof the cluster in the feature space used for clustering where nis the number of exemplars that are chosen to be shown to theuser. This is shown visually in Fig. 3-(b). These exemplars,in effect, find the samples which are closest to the mean withthe idea that this best represents the majority of the samplepoints. The final label for the cluster can be defined as thelabel of the majority of the presented exemplars. It shouldbe noted that this can lead to labeling errors if the clustersare not pure, which can occur when the clustering problemis challenging. We examine use of n = 1, the closest, andn = 10 exemplars, referred to as Mean and Top10 labelingrespectively from this point forward.

2) Affinity Propagation Exemplars: In comparison tomean-based, an AP-based exemplar method attempts to findthe exemplars that best represent the sub-groups which canbe found within each cluster as is shown in Fig. 3-(c). Asmentioned in Section II, AP is a clustering method normallyapplied to the entire dataset with the purpose of findingan unknown number of exemplars and assigning samplesto them. The highlighted advantages of this process arethat it does not require a predefined number of clusters(n is variable for each cluster) and that exemplars can becalculated consistently(i.e. they are not affected by randominitialization). These properties make AP ideal as a methodfor finding the hidden sub-groups within each cluster andpresenting exemplars which best show the variety of sampleswithin each cluster.

Once extracted, these exemplars can be utilized to label theentire cluster as was done for mean-based exemplars wherewe label the cluster according to the class of the majorityof exemplars. In our experiments, this labeling approach isreferred to as AP-Majority labeling. Whilst giving the usergreater breadth of information regarding the data containedwithin the cluster than a mean-based approach, this leads tothe same potential source of labeling error when clusters arenot pure, even if exemplars which describe the impurities arepresented to the user. As an alternative to simply labelingall data in a cluster as being of one class, we introduce anew labeling technique using AP exemplars to further refineclusters to attain higher labeling accuracy at the cost of extrahuman labeling time.

3) AP-based Label Refinement: In this work, we proposea novel approach for data labeling where we refine clustersusing AP exemplars. Making the assumption that each APexemplar is a strong representative of an unknown subgroup

Page 4: A Transplantable System for Weed Classification by ... · A Transplantable System for Weed Classication by Agricultural Robotics David Hall 1, Feras Dayoub 2, Tristan Perez and Chris

of similar looking plants within the main cluster, we allowthe user to label individual exemplars so as to refine theclustering result to improve the final labels. For example, if acluster containing mostly cotton exemplars contains a singlesowthistle exemplar, we assume that this sowthistle exem-plar is representative of a small subgroup of predominantlysowthistle plants found within the main cluster which shouldbe transferred to a different cluster group or formulate itsown group if no sowthistle group is present. Here, assumingthat at least the majority of the samples in the subgroupbelong to the same class, this proposed approach improvesthe final labeling accuracy over the other labeling techniquespresented here. Within our experiments, this approach isreferred to as AP-Refinement labeling.

C. Classification

Using the now-labeled clustering data as training data, wetrain a multi-class linear support vector machine (SVM). Wetrain k linear SVMs where k is equal to the number of classesfound within the training data. New samples are classifiedaccording to which linear SVM best represents them (i.e.places the samples on the positive side of and furthest awayfrom the decision boundary). It should be noted that thedetection and feature extraction process for images whichundergo classification is identical to the process undergonefor unsupervised scouting outlined in Section III-A

While this system will again suffer from the inherentinflexibility of classification systems with regards to unex-pected classes, through use of an initial unsupervised cluster-ing, the classifier should now be calibrated for classificationof the specific species expected in the location it is beingused. The classification system will allow for less humaninteraction than only using clustering approaches while beingspecifically suited to the environment where it is beingimplemented. This classifier should only need to be retrainedin the circumstance that a human operator has deemed itnecessary to re-scout the field (e.g. due to a new, previouslyunidentified species entering the environment).

IV. EXPERIMENTAL SETUP

In order to demonstrate the potential effectiveness ofour approach towards a transplantable weed classificationsystem, we setup a test scenario using data collected from theAgBotII robotic platform. A test field was created, populatedby four known plant species. These being cotton, feathertop,sowthistle, and wild oats as shown in Fig. 4. Note thatthese plants are known for evaluation purposes only. Thesystem we test itself as outlined in Section III has no priorknowledge as to the contents of the field. We split the datarandomly into a Scouting and a Classification set. The splitis performed by randomly selecting half of the observedplants for each species to make up a Scouting set and leavethe remaining plants as part of a Classification set. This wasdone to ensure similar conditions between the Scouting andthe Classification sets. Note that there is usually multipleimages of the same observed plant due to the robot detectingit multiple times as it traverses over it. The distribution of

TABLE I: Dataset Image Summary

Scouting ClassificationCotton 134 132

Feathertop 128 126Sowthistle 25 28Wild Oats 131 126

(a) Cotton (b) Feathertop (c) Sowthistle (d) Wild Oats

Fig. 4: Example of each plant species within our dataset.Black backgrounds have been replaced with white for visu-alization purposes.

images across the Scouting and Classification sets is shownin TABLE I.

Image frames were collected at a rate of 5 Hz as theAgBotII traversed the desired sections of the test field. Afterautonomous detection and background segmentation wasperformed, as outlined in Section III-A, we attain imagessuch as those shown in Fig. 4. Each detected plant imagewas manually annotated according to the four known plantspecies noted above for our experiment.

As in our previous work [5], features describing each plantimage are extracted from a bottleneck layer within a mod-ified GoogLeNet model which has been fine-tuned on leafimages from the PlantCLEF dataset[16]. This produces a 128dimensional descriptor which undergoes L2 normalizationbefore being utilized within our clustering and classificationprocedures.

As explained in Section III-B, we test a variety of differentmethods for labeling data using a set of given exemplars. Toautomate the evaluation process we make a few assumptions.First, we assume that the human will always make the correctlabeling decision when presented with the correct data (e.g.a human will not mistake a cotton image for a sowthistleimage). Secondly, labeling using exemplars is carried out inthe same manner as we assume a human would, defining anygiven cluster’s class as the majority class of the exemplarsprovided. Finally, when using AP-Refinement labeling, weassume that the user shall not stop refinement until allthe presented exemplars have been sorted into their correctclasses. Note that we only extract AP exemplars once, wedo not recalculate exemplars for the refined clusters.

In our tests we evaluate the following labeling approaches:Full - The classical approach of labeling all images in thetraining set.Mean - The approach of labeling each cluster as the classof the sample closest to the mean of said cluster.Top10 - Showing the 10 samples closest to the mean ofa cluster and labeling the cluster as belonging to the classmajority of those 10 samples.AP-Majority - Showing the exemplars generated through

Page 5: A Transplantable System for Weed Classification by ... · A Transplantable System for Weed Classication by Agricultural Robotics David Hall 1, Feras Dayoub 2, Tristan Perez and Chris

TABLE II: Results Summary

Label Method Avg. LTPR(%)

Avg. CTPR(%) #Exemplars

Full 100.0 75.6 418AP-Refinement 78.7 61.0 58

AP-Majority 49.9 48.9 58Mean 49.9 48.9 3Top10 42.4 44.5 30

TABLE III: Complete LTPR

Label Method Cotton(%)

Feathertop(%)

Sowthistle(%)

Wild Oats(%)

Full 100.0 100.0 100.0 100.0AP-Refinement 82.1 78.1 96.0 58.8

AP-Majority 69.4 48.4 0.0 81.7Mean 69.4 48.4 0.0 81.7Top10 69.4 100.0 0.0 0.0

AP for each cluster and assigning the majority class label ofthe exemplars to said cluster.AP-Refinement - Our approach where AP exemplars aregenerated and the user can further refine their labels usingthe exemplars and groups generated using AP.

To improve classification results, after labeling all samplesusing the above-mentioned methods, we augmented our datausing a total of 15 transformations. This included imagereflection along both axes, rotation, cropping, and imagebrightening. Rotations applied to our data was performedwith rotation angles of θ = [5, 10, 90, 180, 270, 350, 355]degrees. Cropping was performed to shrink the originalimages down to 0.9 and 0.8 times their original sizes. Imagebrightening was attained by a transformation of the image tothe HSV colourspace and scaling the brightness channel (V)of plant pixels by values of φ = [0.8, 0.9, 1.1, 1.2].

The linear SVMs underwent tenfold cross validation todetermine the value of the variable C. The values tested wereC = [0.001, 0.01, 0.1, 1, 10, 100, 1000].

We evaluate our labeling method in terms of averagelabeling true positive rate (LTPR), average classification truepositive rate (CTPR), and the number of exemplars shown tothe user. Average LTPR refers to how frequently each classis labeled correctly on average and average CTPR refers tohow frequently a trained classifier classifies each class onaverage. The number of exemplars is simply equal to thetotal number of images shown to, and labeled by, the useracross all clusters. The goal of this work is to achieve highlabeling and classification TPR with the user having to lookat, and label, as few images as possible.

V. RESULTS

The results of our test scenario as outlined in Section IVare summarized in TABLE II in terms of average LTPR,average CTPR and total number of exemplars shown to theuser.

In this table we are able to see the effect of differentlabeling methods, particularly the benefits of our refinementstrategy. We see that the highest classification accuracycomes with Full labeling with an average CTPR of 75.6%.

TABLE IV: Complete CTPR

Label Method Cotton(%)

Feathertop(%)

Sowthistle(%)

Wild Oats(%)

Full 94.7 69.8 71.4 66.7AP-Refinement 81.8 61.1 60.7 40.5

AP-Majority 78.0 48.4 0.0 69.0Mean 78.0 48.4 0.0 69.0Top10 78.0 100.0 0.0 0.0

This is to be expected as there is 100% LTPR, allowing forthe most accurate training data available for each class. Thenext highest is our AP-Refinement approach with an averageCTPR of 61.0%, a decrease of 14.6% from full labeling. Thisexceeds the next best labeling system by 12.1% and 28.8%in CTPR and LTPR respectively. This being achieved whilstlabeling over 7 times fewer images than Full labeling.

We note that the LTPR and CTPR are the same for bothMean and AP-Majority labeling within our experiment.This is because the labeling techniques have led the userto assign the majority class of the cluster as the cluster labelwhich is the best possible scenario for a labeling system thatgives one class to the entire cluster. Top10 is also seen toproduce slightly worse results than that of Mean labeling.When examining the labels provided by Top10 labeling, itwas seen that one cluster had a 50-50 split of exemplarsfrom the majority class and one of the minority classes. Inthis case, the user chose incorrectly which adversely affectedaverage LTPR and CTPR.

When we further examined the LTPR CTPR for eachindividual class (summarized in TABLE III and IV re-spectively), we see further benefits from AP-Refinement.The most significant benefit was found when examiningsowthistle TPRs. Because the other labeling techniques onlyallow for a single label per cluster and only three clusterswere found by the utilized clustering method, only threeclasses can be represented by these labeling methods. Inthis case, sowthistle, which is the minority class within thedata as shown in TABLE I, gets completely ignored by thesetechniques, giving TPRs of 0.0%. In comparison to this, AP-Refinement is able to find the samples incorrectly clusteredwith other plants and use them to train a sowthistle classifier.We see a 60.7% CTPR and a 96.0% LTPR for sowthistlewhen using AP-Refinement as apposed to the 0.0% TPRsfor other labeling methods. This shows that our technique ofAP-Refinement can facilitate the labeling of classes whichmight have otherwise gone unnoticed by the system.

We also note that AP-Refinement provided some otherimprovements for two of the other classes. The TPRs forcotton and feathertop increased using our approach, withCTPR increasing from 78.0% to 81.8% and from 48.4% to61.1% respectively when compared to the next best method.Wild oats CTPR however was seen to decrease with AP-Refinement, dropping from 69.0% to 40.3%. When exam-ining the final labeling distributions between Mean and AP-Refinement, we observed that AP-Refinement incorrectlyrelabeled 21 samples of wild oats as feathertop. This problemarises due to the similarity in appearance between wild

Page 6: A Transplantable System for Weed Classification by ... · A Transplantable System for Weed Classication by Agricultural Robotics David Hall 1, Feras Dayoub 2, Tristan Perez and Chris

Fig. 5: Visual representation of the exemplars generatedusing the mean, top 10 samples closest to the mean, andaffinity propagation exemplars for a cluster generated frominitial weed scouting. Colours around images relate to groundtruth labels of each image: blue=wild oats, red=feathertop,green=cotton, and orange=sowthistle. Best viewed in colour.

oats and feathertop however, it should be noted that thisconsequence occurred during the process of relabeling 38feathertop samples which had incorrectly been labeled aswild oats under the original clustering.

The differences and benefits of AP-based exemplars suchas is used for our AP-Refinement when compared to mean-based exemplars can also be shown qualitatively when weexamine the generated exemplars shown in Fig 5. This showsthe least pure of the generated clusters with wild oats asthe dominant class within. Here, we see that the Meanfor this cluster is the dominant class, however, this givesno extra information about the composition of the cluster.The Top10 approach in comparison, shows an even splitof feathertop and wild oats plants, providing ambiguity asto the identity of the cluster. Visually, the presented plantsand plant species are quite similar with some images evenbeing multiple instances of the same plant, providing a verynarrow representation of the plants found within the cluster.In comparison to this, AP provides a wide distribution,describing the variety of samples within the cluster, as wellas displaying all of the different species found within thecluster. No method that doesn’t allow for cluster refinementcan deal with this issue of impure clustering and this leadsto the lower performance shown in TABLE II.

VI. CONCLUSION

In this work, we introduce and demonstrate a new ap-proach towards a transplantable system for weed classi-fication in agricultural robotics. We present a three-stagestrategy consisting of unsupervised weed scouting, quick datalabeling, and species-specific precision weeding, which couldoperate without needing any prior knowledge about weedswithin the field where the system is operating. Focusing onthe data labeling stage, we create and test a novel approachto data labeling, achieving the best classification true positiverate of all exemplar-based approaches whilst using over seventimes fewer labeling images than full data labeling. Thispreliminary work demonstrates how through clustering andefficient labeling strategies, we could create a classificationsystem specifically designed for a given field’s needs, whichcould then be used to implement autonomous integrated

weed management strategies without need for any prior weedspecies knowledge.

ACKNOWLEDGMENTS

We would like to acknowledge and thank the GrainsResearch and Development Corporation for contributingfunds towards this research through their grains researchscholarship program.

REFERENCES

[1] N. Gilbert, “A hard look at GM crops,” Nature, vol. 497, no. 7447,pp. 24–26, 2013.

[2] G. Charles and T. Leven, “Integrated weed management (iwm) foraustralian cotton,” Cotton Pest Management Guide, pp. 88–119, 2011.

[3] GRDC. (2016) Integrated weed management hub. Accessed 07-09-2016. [Online]. Available: www.grdc.com.au/Resources/IWMhub

[4] D. C. Slaughter, D. K. Giles, and D. Downey, “Autonomous roboticweed control systems: A review,” Computers and electronics inagriculture, vol. 61, no. 1, pp. 63–78, 2008.

[5] D. Hall, F. Dayoub, J. Kulk, and C. McCool, “Towards unsupervisedweed scouting for agricultural robotics,” in Proceedings of the 2017IEEE International Conference on Robotics and Automation, Singa-pore, May 2017.

[6] B. strand and A.-J. Baerveldt, “An Agricultural Mobile Robot withVision-Based Perception for Mechanical Weed Control,” AutonomousRobots, vol. 13, no. 1, pp. 21–35, July 2002.

[7] F.-M. De Rainville, A. Durand, F.-A. Fortin, K. Tanguy, X. Maldague,B. Panneton, and M.-J. Simard, “Bayesian classification and unsuper-vised learning for isolating weeds in row crops,” Pattern Analysis andApplications, pp. 1–14, 2012.

[8] R. Gerhards and H. Oebel, “Practical experiences with a systemfor site-specific weed control in arable crops using real-time imageanalysis and GPS-controlled patch spraying,” Weed Research, vol. 46,no. 3, pp. 185–193, June 2006.

[9] C. Lin, “A support vector machine embedded weed identificationsystem,” Ph.D. dissertation, University of Illinois, 2009.

[10] S. A. Shearer, R. G. Holmes, and others, “Plant identification usingcolor co-occurrence matrices.” Transactions of the ASAE, vol. 33,no. 6, pp. 2037–2044, 1990.

[11] T. F. Burks, S. A. Shearer, F. A. Payne, and others, “Classification ofweed species using color texture features and discriminant analysis,”Transactions of the ASAE, vol. 43, no. 2, pp. 441–448, 2000.

[12] Z. Liu, P. Li, Y. Zheng, and M. Sun, “Clustering to Find ExemplarTerms for Keyphrase Extraction,” in Proceedings of the 2009 Confer-ence on Empirical Methods in Natural Language Processing: Volume1 - Volume 1, ser. EMNLP ’09. Stroudsburg, PA, USA: Associationfor Computational Linguistics, 2009, pp. 257–266.

[13] D. Dueck and B. J. Frey, “Non-metric affinity propagation for un-supervised image categorization,” in 2007 IEEE 11th InternationalConference on Computer Vision. IEEE, 2007, pp. 1–8.

[14] Y. Jia, J. Wang, C. Zhang, and X.-S. Hua, “Finding Image ExemplarsUsing Fast Sparse Affinity Propagation,” in Proceedings of the 16thACM International Conference on Multimedia, ser. MM ’08. NewYork, NY, USA: ACM, 2008, pp. 639–642.

[15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper WithConvolutions,” in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, 2015, pp. 1–9.

[16] P. Bonnet, W.-P. Vellinga, R. Planqu, A. Rauber, S. Palazzo, B. Fisher,and H. Mller, “LifeCLEF 2015: Multimedia Life Species IdentificationChallenges,” in Experimental IR Meets Multilinguality, Multimodality,and Interaction: 6th International Conference of the CLEF Associa-tion, CLEF’15, Toulouse, France, September 8-11, 2015, Proceedings,vol. 9283. Springer, 2015, p. 462.

[17] S. C. Johnson, “Hierarchical clustering schemes,” Psychometrika,vol. 32, no. 3, pp. 241–254, Sept. 1967.