Pixel-by-Pixel Arterial Spin Labeling Blood Flow Pattern ...
SemanticSegmentationunderaComplexBackgroundforMachine ... · 2020. 5. 2. · analysis module to...
Transcript of SemanticSegmentationunderaComplexBackgroundforMachine ... · 2020. 5. 2. · analysis module to...
Research ArticleSemanticSegmentationunderaComplexBackgroundforMachineVision Detection Based on Modified UPerNet with ComponentAnalysis Modules
Jian Huang Guixiong Liu and Bodi Wang
School of Mechanical and Automotive Engineering South China University of Technology Guangzhou 510640 China
Correspondence should be addressed to Guixiong Liu megxliuscuteducn
Received 2 May 2020 Revised 31 July 2020 Accepted 17 August 2020 Published 12 September 2020
Academic Editor Yang Li
Copyright copy 2020 Jian Huang et al )is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Semantic segmentation with convolutional neural networks under a complex background using the encoder-decoder networkincreases the overall performance of online machine vision detection and identification To maximize the accuracy of semanticsegmentation under a complex background it is necessary to consider the semantic response values of objects and componentsand their mutually exclusive relationship In this study we attempt to improve the low accuracy of component segmentation )ebasic network of the encoder is selected for the semantic segmentation and the UPerNet is modified based on the componentanalysis module )e experimental results show that the accuracy of the proposed method improves from 4889 to 5562 andthe segmentation time decreases from 721 to 496ms )e method also shows good performance in vision-based detection of 2019Chinese Yuan features
1 Introduction
As one of the primary tasks of machine vision semanticsegmentation differs from image classification and objectdetection )e image classification process involved therecognition of the the type of object but cannot provideposition information [1] whereas object detection can beused to detect the boundary and type of the object but cannotprovide the actual boundary information [2] On the otherhand semantic segmentation can recognize the type of theobject and divide the actual area at the pixel level as well asimplement certain machine vision detection functions suchas positioning and recognition [3] As we start from imageclassification move to object detection and finally reachsemantic segmentation the accuracy of the output range andposition information improves [4] In the same manner therecognition precision increases from the image-level to thepixel-level Semantic segmentation achieves the best rec-ognition accuracy therefore it is useful in (1) distinguishingthe entity from the background (2) obtaining the positioninformation (centroid) clearly physically defined by indirect
calculation and (3) performing machine vision detectionand identification organization which require high spatialresolution and reliability [5 6]
)e online semantic segmentation with convolutionalneural networks (CNNs) under a complex background iseffective for improving the overall performance of onlinemachine vision detection and identification [7] whenmaintaining the same architecture of the encoder-decodernetwork and convolutional and pooling layer and equiv-alently transforming the fully connected layer thusyielding broad generalization In recent years ResNet hasbeen used to replace the shallow CNN to optimize semanticsegmentation results significantly [8] For machine visiondetection and identification under a random-texturecomplex background it is necessary to eliminate therandom-texture complex background to extract the objectwithout affecting the original features of the object [9] )edifficulty lies in the randomness of the textured back-ground which makes it difficult to employ typical periodictexture elimination techniques such as the frequencydomain filtering and image matrix methods [10 11] On the
HindawiMathematical Problems in EngineeringVolume 2020 Article ID 6903130 13 pageshttpsdoiorg10115520206903130
contrary the encoder-decoder semantic segmentationnetwork ultimately retains the classification components inthe network backbone thus exhibiting larger receptivefields and better pixel recognition ability [12 13] asdepicted in Figure 1 Unreasonably selected and conse-quently incorrectly used component analysis modules willlead to an excessively small foreground range resulting inthe misjudgment of component pixels If the componentanalysis module is too sensitive the foreground range willbe too broad thus it would be difficult to remove mis-judged pixels [14] )erefore in the process of semanticsegmentation under the complex background it is nec-essary to consider objects the contradiction between thecomponent semantic response values and their mutualexclusion relationship while maximizing the accuracy ofthe semantic segmentation under the complex backgroundusing the encoder-decoder network
Figure 2 shows a flowchart of the semantic segmen-tation under the complex background using the encoder-decoder network )e process can be described as followsthe component classifier of the encoder-decoder networkrecognizes the pixel-level semantics and response of thepixels in the image the object classifier recognizes thepixel-level object semantics and the response and extractsmisjudged pixels of the foreground object in semanticsegmentation finally the mutually exclusive relationshipbetween component semantics and object semantics isconsidered and non-background-independent semanticsare determined to achieve effective semantic segmentationunder a complex background to improve the model ac-curacy [15]
In this study we focus on online semantic segmentationunder a complex background using the encoder-decodernetwork to solve the above described mutual exclusionrelationship problem between component semantics andobject semantics )e main contributions of this study arethreefold
(i) We attempted to improve the low accuracy ofcomponent segmentation and selected the superiorbasic encoder-decoder network according to theperformance
(ii) We modified the UPerNet based on the componentanalysis module to maximize the accuracy of thesemantic segmentation under a complex back-ground using the encoder-decoder network whilemaintaining an appropriate segmentation time
(iii) We show that the proposed method is superior toprevious encoder-decoder network and has satis-factory accuracy and segmentation time We alsoshow the application of the proposedmethod in bill-note anticounterfeiting identification
)e rest of this paper is organized as follows In Section1 we outline related works In Section 2 we introduce amethod for semantic segmentation under a complexbackground using the encoder-decoder network In Section3 we verify the proposed method In Section 4 we presentthe conclusions
2 Related Work
21 Evaluation of the Semantic Segmentation PerformanceWe can generally evaluate the CNN semantic segmentationperformance from the accuracy and running speed )eaccuracy indicators usually include the pixel accuracy [16]mean intersection over union [16] and mean averageprecision [17] )e pixel accuracy PA is defined as thenumber of pixels segmented correctly accounting for thetotal number of image pixels the mean intersection overunion IoU is defined as the degree of coincidence betweenthe segmentation results and their ground-truth the meanaverage precision APIoUT is the mean of average precisionscores for segmentation results whose intersection overunion no less than IoUT for each classes
If the object detected by machine vision has k categoriesthe semantic segmentation model requires the label of thek + 1 categories denoted as L l0 l1 lk1113864 1113865 including thebackground Denoting the number of pixels of li mis-rec-ognized as the pixel of lj and li(ine j) as pij and pii re-spectively the numbers of detected objects of li mis-recognized as lj and li(ine j) as Nij and Nii respectively thepixel accuracy can be calculated as follows
PA 1113944
k
i0pii
1113944k
i01113944k
j0pij
IoU 1113944K
i0pii
1113944k
j0pij + 1113944k
j0pji minus pii
(1)
APIoUT 1113944
k
i0Nii
1113944k
i01113944k
j0Nij
(2)
)e running speed of CNN semantic segmentation canbe measured by indicators including the segmentation timeTseg [18] which is defined as the time needed to segment theimage by running the algorithm )e theoretically shortestpossible time required to segment the image is also labeled asthe theoretical segmentation time Tsegminust and the time re-quired for the algorithm to actually segment the image isknown as the actual segmentation time Tsegminusa If not oth-erwise specified Tsegminusa is denoted as Tseg
22 End-To-End Encoder-Decoder Semantic SegmentationFramework Although CNN semantic segmentation per-forms as a single-step end-to-end process which is notfurther divided into multiple modules to deal with theconnection of numerous modules directly affects the CNN)e end-to-end semantic segmentation framework using theencoder-decoder enables the CNN to detect images with anyresolution and output prediction map results with constantresolution Typical networks include fully convolutionalnetworks (FCN) [19] SegNet [20] and U-Net [21]
Figure 3 shows a schematic of the FCN model )e FCNis an end-to-end semantic segmentation framework
2 Mathematical Problems in Engineering
proposed by Jonathan Long et al (University of CaliforniaBerkeley) in 2014 )e main idea is as follows the operationof a fully connected layer is equivalent to the convolution ofa feature map and a kernel function of identical size )efully connected layer is converted into a convolution layerwhich converts the CNN into a full convolution operationnetwork consisting of a complete convolution layer (con-volution operation) and pooling layer (convolution opera-tion) to process images of any resolution In this manner thelimitation of the fully connected layer is overcome ieimages with different resolutions can be processed )eoriginal resolution is restored after eight times bilinearupsampling by taking the pooling layer as an encoder de-signing a cross-layer superimposed architecture as a de-coder yielding the final output feature map of the networkby upsampling and adding to the output feature map of eachpooling layer (namely the encoder) to obtain a feature mapwith higher resolution )e CNN can perform end-to-endsemantic segmentation through a fully convolutional andcross-layer superimposed architecture therefore various
CNNs are capable of achieving end-to-end semantic seg-mentation Using the framework described the IoU reached622 in the VOC2012 semantic segmentation testing setwhich is 106 higher than the classic methods and 122(its IoU is 500) higher than the SDS [22] further seg-mented by CNN object detection and classical method
)e ResNet proposed by the Amazon Artificial Intelli-gence Laboratory serves as a basic network for constructingFCNs for semantic segmentation the IoU in VOC2012reaches 86 [23] )e prediction results of the FCN ap-plication are obtained by eight-fold bilinear interpolation ofthe feature map including the problems of detail losssmoothing of complex boundaries and poor detectionsensitivity of small objects)e results ignore the global scaleof the image possibly exhibiting regional discontinuity forlarge objects that exceed the receptive field Incorporatingfull connection and upsampling increase the size of thenetwork and introduces a large number of parameters to belearned
Figure 4 shows a schematic of the SegNet model which isan efficient real-time end-to-end semantic segmentationnetwork proposed by Alex Kendall et al (Cambridge Uni-versity) in 2015 )e idea is that the encoder and the decoderhave a one-to-one correspondence and the network appliesthe pooled index in the encoderrsquos maximum pooling toperform nonlinear upsampling thus forming a sparse featuremap then it performs convolution to generate a dense featuremap SegNet defines the basic network of the encoder-de-coder and deletes the fully connected layer to generate globalsemantic information )e decoder utilizes the encoder in-formation without training while the required amount oftraining parameters is 217 of that of the FCN For theprediction of the results SegNet and FCN occupy a GPUmemory of 1052 and 1806MB respectively and the GPUmemory occupancy on GPU GTX 980 (video memory4096MB) is 2568 and 4409 respectively )erefore theoccupancy of SegNet is 1841 lower than that of FCN In[20] the design of SegNet on ResNet was described and theIoU in VOC2012 reached 804 [24] )e IoU of SegNettested in VOC2012 was reported to be 599 and the
Object response maps
Componentsegmentation
Objectsegmentation
Valid component segmentation
Relation between component and object
Object regionand label
Component response maps
Component regionand label
Figure 1 Flowchart of semantic segmentation under the complex background using encoder-decoder network
Start
The encoder-decoder network transforms images to feature maps and the component classifier recognizes the
part response map and generates part segment results
The object classifier recognizes the object response map and generates object segmentation results with the
pixels of the foreground object extracted
End
The relationship between the component and objectis analyzed to improve the accuracy of part segmentation
Figure 2 Flowchart of the semantic segmentation under a complexbackground using the encoder-decoder network
Mathematical Problems in Engineering 3
efficiency was found to be 23 lower than that of FCNfurthermore there was the problem of false detection at theboundary
Figure 5 shows a schematic of the U-Net model whichwas proposed by Olaf Ronneberger (University of FreiburgGermany) in 2015 )e idea was to design a basic networkthat can be trained by semantic segmentation images andmodify the FCN cross-layer overlay architecture with thehigh-resolution feature map channels retained in theupsampling section and then connect it to the decoderoutput feature map in the third dimension Furthermore atiling strategy without limited by GPU memory was pro-posed with this strategy a seamless semantic segmentationof arbitrary high-resolution images was achieved WithU-Net a IoU of 920 and 776 was achieved in the
grayscale image semantic segmentation datasets PhC-U373and DIC-HeLa respectively )e skip connection was usedin the ResNet framework to improve U-Net and a IoU of827 was achieved in the VOC2012 [25] )ere are two keyproblems with the application of U-Net the basic networkneeds to be trained and it can only be applied to specifictask ie it has poor universality
Figure 6 shows a schematic of the UPerNet model whichwas proposed by Tete Xiao (Peking University China) in2018 In the UPerNet framework a feature pyramid network(FPN) with a pyramid pooling module (PPM) is appendedon the last layer of the backbone network before feeding itinto the top-down branch of the FPN Object and part headsare attached on the feature map and are fused by all thelayers put out by the FPN
Decoderskip connect
Encoder (Backbone)
Pooling layersConvolutional layers
2times upsample
2times upsample
8times upsample sum sum
Figure 3 Schematic of FCN model
Decoder
Encoder (Backbone)
IndexIndexIndexIndexIndex
Pooling layersConvolutional layersDepool layers
Figure 4 Schematic of SegNet model
4 Mathematical Problems in Engineering
3 Material and Methods
)e semantic segmentation under a complex backgroundbased on the encoder-decoder network will establish anoptimized mathematical model with minimal segmentationtime Tsegminusmin segmentation time Tseg and accuracy PAUnder the encoder-decoder network the backbone network
ηmain the depth dmain and the decoder ηdecoder are obtainedto form an encoder By selecting the relatively better ηmainand ηdecoder of the basic network the component analysismodule to improve the optimized architecture is proposedand the encoder-decoder network with optimized PA forsemantic segmentation under a complex background isobtained In the encoder-decoder network the encoder
Decoder
Encoder (Backbone)
ConcatConcatConcatConcat
Pooling layersConvolutional layersDeconv layers
Figure 5 Schematic of U-Net model
PPM head
Fuse
Head
Head
Head
Fused feature map
Head
Material
Object
Part
Scene
No grad
Head
Texture(Testing)
orImage
(~450 times 720)
Image for texture(~48 times 48)
= Marbled
Feature Pyramid Network
Conv 3 times 3 GlobalAvgPooling Classifier
Scene head
Conv 3 times 3 Classifier
ObjectPartMaterial head
Conv 3 times 3 Conv 3 times 3 Classifier
Texture head
4x Conv (128 channels)
(Training)
Living room
14
18
116
132
14
14
18
116
132
Figure 6 Flowchart of UPerNet model
Mathematical Problems in Engineering 5
transforms color images (three 2D arrays) to 2048 2D arrays)e encoder is composed of convolutional layers andpooling layers and it could be trained on large-scale clas-sification datasets such as ImageNet to gain greater featureextraction capability
Modeling of semantic segmentation under a complexbackground using the encoder-decoder network and se-lection of ηmain and ηdecoder
)e encoder network is determined by the backbonenetwork ηmain depth dmain and decoder ηdecoder Segmen-tation time Tseg and accuracy PA depend on ηdecoder ηmainand dmain which can be expressed asPA(ηdecoder ηmain dmain) and Tseg(ηdecoder ηmain dmain)Denoting the minimal segmentation time as Tsegminusmin (therecommended value is 600ms) the mathematical model ofthe optimization for semantic segmentation under a com-plex background based on the encoder-decoder network isas follows
max PA ηdecoder ηmain dmain( 1113857
st Tseg ηdecoder ηmain dmain( 1113857leTsegminusmin
⎧⎨
⎩ (3)
)e parameters of the model to be optimized are dmainηmain and ηdecoder
First dmain ηmain and ηdecoder are combined )en theobject segmentation accuracy PAobj component segmen-tation accuracy PAcomp and Tseg are compared to select therelatively better ηmain and ηdecoder for the basic network
)e ADE20K dataset which has diverse annotations ofscenes objects parts of objects and parts of parts [26] isselected In this paper we denote parts of objects as com-ponent Using a GeForce GTX 1080Ti GPU and the trainingmethod described in [27] we obtained PAobj and PAcomp forimproved FCN [19] PSPNet [28] UPerNet [29] and othermajor encoder-decoder networks for semantic segmentationused in the ADE20K [26] objectcomponent segmentationdataset We evaluated Tseg of different network on theADE20K test set which consist of 3000 different resolutionimages with average image size of 13 million pixels Table 1displays the pixel accuracy and segmentation time of themain network architectures on ADE20K objectcomponentsegmentation tasks where the relatively better indices areindicated by a rectangular contour
From Table 1 the following observation can be made①In all networks PAcomp is less than PAobj by about 30 ②ηmain and dmain are equal in networks 1 2 and 3 PAcomp andPAobj are better in ηdecoder FPN + PPM compared toηdecoder FCN or ηdecoder PPM ③ηmain and ηdecoder areequal in networks 3 and 4 When dmain is doubled PAcompimproves slightly and Tseg improves significantly After acomprehensive consideration we selected the UPerNet [23]encoder-decoder network where ηmain ResNetdmain 50 and ηdecoder PPM + FPN
Figure 7 shows the architecture of semantic segmen-tation under a complex background implemented byUPerNet [29] )e encoder ResNet reduces the featuremap resolution by 12 at each stage )e resolution of the
output feature maps within five stages is respectively re-duced to 12 14 18 116 and 132 )e decoder isPPM+ FPN )rough pooling layers with different stridesthe feature maps are analyzed in a multiscale mannerwithin PPM )rough three transposed convolutionlayers the resolution of the feature maps is increased twotimes to 116 18 and 14 )e upsampling restores thefeature map resolution to 11 )e component analysismodule recognizes the feature map and outputs both theobjectcomponent segmentation results
Figure 8 shows the component analysis module ofUPerNet )e module is composed of the object classifiercomponent classifier and component analysis module)e input of each classifier is a 1 1 feature map )e objectclassifier implements the semantic recognition of NObjkinds of objects and outputs the object probability vectorpuvObj and the object label Cuv
Obj )e component classifierimplements the semantic recognition of NComp kinds ofcomponents and outputs the component probabilityvector puv
Comp and the component label CuvComp According
to CuvObj and the component object set CObjminusThings the
component analysis module only segments the CuvComp that
satisfies CuvObj isin CObjminusThings and outputs the valid component
label 1113954Cuv
Comp UPerNet outputs the object segmentation result(the object labelCuv
Obj) and the component segmentation result(the valid component label 1113954C
uv
Comp))e component analysis module of UPerNet can be
expressed as follows
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times C
uvComp C
uvObj isin CObjminusThings
0 times CuvComp C
uvObj notin CObjminusThings
⎧⎨
⎩
(4)
A greater PAcomp of 1113954Cuv
Comp leads to a higher componentsegmentation efficiency
Equation (4) outputs 1113954Cuv
Comp that satisfiesCuvObj isin CObjminusThings By identifying deviations of Cuv
Obj due tothe relationship between Cuv
Comp and CuvObj the optimized
component analysis module can improve the efficiency ofcomponent segmentation it both meets the requirement ofTsegminusmin and improves PAcomp
Improvements of UPerNet for semantic segmentationunder a complex background based on the componentanalysis module
In this subsection we describe the derivation of thecomponent analysis module the optimization of the func-tion expression of the module and the construction of thearchitecture of the component analysis module
As shown in Figure 8 the component classifier recog-nizes Ncomp component semantics and outputs the com-ponent labels Cuv
Comp of the pixel with image position (u v)
and the probability vector puvComp corresponding to the
various component labels )e relationship between CuvComp
and puvComp [31] is as follows
CuvComp argmaxkpCompminusk k 1 2 NComp (5)
From equation (4) and (5) we obtain
6 Mathematical Problems in Engineering
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times argmaxkpCompminusk C
uvObj isin CObjminusThings
0 times argmaxkpCompminusk CuvObj notin CObjminusThings
⎧⎨
⎩ k 1 2 NComp(6)
where pObjminusj is the probability of CuvComp Weighting pObjminusj
over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv
Obj isin CObjminusThings if
CuvComp notin C
CObjCompminusObj letting 1113954pCompminusk 0 can increase the
detection rate of background pixels )erefore the modulecan be expressed as follows
DecoderEncoder ηmain = ResNet
CS3 CS4 CS5CS1 CS2
Poolingpyramid module
PPM
Cup-1 Cup-2 Cup-3 Fusion
Object classifier
Pixel accuracyPA
Ups
ampl
e
Component segmentation
Objectsegmentation
Component classifier
Feature map pramid network FPN
Component analysis module
Componentanalysis model
Inputimage
ηdecoder = PPM + FPN
Segmentation time Tseg (ηmain dmain ηdecoder)
Depth dmain = 50
Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet
Componentclassifier
Validcomponent
label
Object classifier
Componentanalysis model1 1
Featuremap
Object label Cuv
Comp label Cuv
Valid object setℂObj-Things
Object segmentation
Componentsegmentation
CompfO (Cuv
CuvObj
Obj
Comp)CompC uv
Figure 8 Flowchart of component analysis module of UPerNet
Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices
Network Backboneηmain
Backbonedepth dmain
Decoderηdecoder
Object segmentationaccuracy () PAobj
Component segmentationaccuracy () PAcomp
Segmentation time(ms) Tseg
1 FCN [30] ResNet 50 FCN 7132 4081 333
2 PSPNet[28] ResNet 50 PPM 8004 4723 483
3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496
4 UPerNet[29] ResNet 101 PPM+FPN 8101 4871 604
Mathematical Problems in Engineering 7
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
contrary the encoder-decoder semantic segmentationnetwork ultimately retains the classification components inthe network backbone thus exhibiting larger receptivefields and better pixel recognition ability [12 13] asdepicted in Figure 1 Unreasonably selected and conse-quently incorrectly used component analysis modules willlead to an excessively small foreground range resulting inthe misjudgment of component pixels If the componentanalysis module is too sensitive the foreground range willbe too broad thus it would be difficult to remove mis-judged pixels [14] )erefore in the process of semanticsegmentation under the complex background it is nec-essary to consider objects the contradiction between thecomponent semantic response values and their mutualexclusion relationship while maximizing the accuracy ofthe semantic segmentation under the complex backgroundusing the encoder-decoder network
Figure 2 shows a flowchart of the semantic segmen-tation under the complex background using the encoder-decoder network )e process can be described as followsthe component classifier of the encoder-decoder networkrecognizes the pixel-level semantics and response of thepixels in the image the object classifier recognizes thepixel-level object semantics and the response and extractsmisjudged pixels of the foreground object in semanticsegmentation finally the mutually exclusive relationshipbetween component semantics and object semantics isconsidered and non-background-independent semanticsare determined to achieve effective semantic segmentationunder a complex background to improve the model ac-curacy [15]
In this study we focus on online semantic segmentationunder a complex background using the encoder-decodernetwork to solve the above described mutual exclusionrelationship problem between component semantics andobject semantics )e main contributions of this study arethreefold
(i) We attempted to improve the low accuracy ofcomponent segmentation and selected the superiorbasic encoder-decoder network according to theperformance
(ii) We modified the UPerNet based on the componentanalysis module to maximize the accuracy of thesemantic segmentation under a complex back-ground using the encoder-decoder network whilemaintaining an appropriate segmentation time
(iii) We show that the proposed method is superior toprevious encoder-decoder network and has satis-factory accuracy and segmentation time We alsoshow the application of the proposedmethod in bill-note anticounterfeiting identification
)e rest of this paper is organized as follows In Section1 we outline related works In Section 2 we introduce amethod for semantic segmentation under a complexbackground using the encoder-decoder network In Section3 we verify the proposed method In Section 4 we presentthe conclusions
2 Related Work
21 Evaluation of the Semantic Segmentation PerformanceWe can generally evaluate the CNN semantic segmentationperformance from the accuracy and running speed )eaccuracy indicators usually include the pixel accuracy [16]mean intersection over union [16] and mean averageprecision [17] )e pixel accuracy PA is defined as thenumber of pixels segmented correctly accounting for thetotal number of image pixels the mean intersection overunion IoU is defined as the degree of coincidence betweenthe segmentation results and their ground-truth the meanaverage precision APIoUT is the mean of average precisionscores for segmentation results whose intersection overunion no less than IoUT for each classes
If the object detected by machine vision has k categoriesthe semantic segmentation model requires the label of thek + 1 categories denoted as L l0 l1 lk1113864 1113865 including thebackground Denoting the number of pixels of li mis-rec-ognized as the pixel of lj and li(ine j) as pij and pii re-spectively the numbers of detected objects of li mis-recognized as lj and li(ine j) as Nij and Nii respectively thepixel accuracy can be calculated as follows
PA 1113944
k
i0pii
1113944k
i01113944k
j0pij
IoU 1113944K
i0pii
1113944k
j0pij + 1113944k
j0pji minus pii
(1)
APIoUT 1113944
k
i0Nii
1113944k
i01113944k
j0Nij
(2)
)e running speed of CNN semantic segmentation canbe measured by indicators including the segmentation timeTseg [18] which is defined as the time needed to segment theimage by running the algorithm )e theoretically shortestpossible time required to segment the image is also labeled asthe theoretical segmentation time Tsegminust and the time re-quired for the algorithm to actually segment the image isknown as the actual segmentation time Tsegminusa If not oth-erwise specified Tsegminusa is denoted as Tseg
22 End-To-End Encoder-Decoder Semantic SegmentationFramework Although CNN semantic segmentation per-forms as a single-step end-to-end process which is notfurther divided into multiple modules to deal with theconnection of numerous modules directly affects the CNN)e end-to-end semantic segmentation framework using theencoder-decoder enables the CNN to detect images with anyresolution and output prediction map results with constantresolution Typical networks include fully convolutionalnetworks (FCN) [19] SegNet [20] and U-Net [21]
Figure 3 shows a schematic of the FCN model )e FCNis an end-to-end semantic segmentation framework
2 Mathematical Problems in Engineering
proposed by Jonathan Long et al (University of CaliforniaBerkeley) in 2014 )e main idea is as follows the operationof a fully connected layer is equivalent to the convolution ofa feature map and a kernel function of identical size )efully connected layer is converted into a convolution layerwhich converts the CNN into a full convolution operationnetwork consisting of a complete convolution layer (con-volution operation) and pooling layer (convolution opera-tion) to process images of any resolution In this manner thelimitation of the fully connected layer is overcome ieimages with different resolutions can be processed )eoriginal resolution is restored after eight times bilinearupsampling by taking the pooling layer as an encoder de-signing a cross-layer superimposed architecture as a de-coder yielding the final output feature map of the networkby upsampling and adding to the output feature map of eachpooling layer (namely the encoder) to obtain a feature mapwith higher resolution )e CNN can perform end-to-endsemantic segmentation through a fully convolutional andcross-layer superimposed architecture therefore various
CNNs are capable of achieving end-to-end semantic seg-mentation Using the framework described the IoU reached622 in the VOC2012 semantic segmentation testing setwhich is 106 higher than the classic methods and 122(its IoU is 500) higher than the SDS [22] further seg-mented by CNN object detection and classical method
)e ResNet proposed by the Amazon Artificial Intelli-gence Laboratory serves as a basic network for constructingFCNs for semantic segmentation the IoU in VOC2012reaches 86 [23] )e prediction results of the FCN ap-plication are obtained by eight-fold bilinear interpolation ofthe feature map including the problems of detail losssmoothing of complex boundaries and poor detectionsensitivity of small objects)e results ignore the global scaleof the image possibly exhibiting regional discontinuity forlarge objects that exceed the receptive field Incorporatingfull connection and upsampling increase the size of thenetwork and introduces a large number of parameters to belearned
Figure 4 shows a schematic of the SegNet model which isan efficient real-time end-to-end semantic segmentationnetwork proposed by Alex Kendall et al (Cambridge Uni-versity) in 2015 )e idea is that the encoder and the decoderhave a one-to-one correspondence and the network appliesthe pooled index in the encoderrsquos maximum pooling toperform nonlinear upsampling thus forming a sparse featuremap then it performs convolution to generate a dense featuremap SegNet defines the basic network of the encoder-de-coder and deletes the fully connected layer to generate globalsemantic information )e decoder utilizes the encoder in-formation without training while the required amount oftraining parameters is 217 of that of the FCN For theprediction of the results SegNet and FCN occupy a GPUmemory of 1052 and 1806MB respectively and the GPUmemory occupancy on GPU GTX 980 (video memory4096MB) is 2568 and 4409 respectively )erefore theoccupancy of SegNet is 1841 lower than that of FCN In[20] the design of SegNet on ResNet was described and theIoU in VOC2012 reached 804 [24] )e IoU of SegNettested in VOC2012 was reported to be 599 and the
Object response maps
Componentsegmentation
Objectsegmentation
Valid component segmentation
Relation between component and object
Object regionand label
Component response maps
Component regionand label
Figure 1 Flowchart of semantic segmentation under the complex background using encoder-decoder network
Start
The encoder-decoder network transforms images to feature maps and the component classifier recognizes the
part response map and generates part segment results
The object classifier recognizes the object response map and generates object segmentation results with the
pixels of the foreground object extracted
End
The relationship between the component and objectis analyzed to improve the accuracy of part segmentation
Figure 2 Flowchart of the semantic segmentation under a complexbackground using the encoder-decoder network
Mathematical Problems in Engineering 3
efficiency was found to be 23 lower than that of FCNfurthermore there was the problem of false detection at theboundary
Figure 5 shows a schematic of the U-Net model whichwas proposed by Olaf Ronneberger (University of FreiburgGermany) in 2015 )e idea was to design a basic networkthat can be trained by semantic segmentation images andmodify the FCN cross-layer overlay architecture with thehigh-resolution feature map channels retained in theupsampling section and then connect it to the decoderoutput feature map in the third dimension Furthermore atiling strategy without limited by GPU memory was pro-posed with this strategy a seamless semantic segmentationof arbitrary high-resolution images was achieved WithU-Net a IoU of 920 and 776 was achieved in the
grayscale image semantic segmentation datasets PhC-U373and DIC-HeLa respectively )e skip connection was usedin the ResNet framework to improve U-Net and a IoU of827 was achieved in the VOC2012 [25] )ere are two keyproblems with the application of U-Net the basic networkneeds to be trained and it can only be applied to specifictask ie it has poor universality
Figure 6 shows a schematic of the UPerNet model whichwas proposed by Tete Xiao (Peking University China) in2018 In the UPerNet framework a feature pyramid network(FPN) with a pyramid pooling module (PPM) is appendedon the last layer of the backbone network before feeding itinto the top-down branch of the FPN Object and part headsare attached on the feature map and are fused by all thelayers put out by the FPN
Decoderskip connect
Encoder (Backbone)
Pooling layersConvolutional layers
2times upsample
2times upsample
8times upsample sum sum
Figure 3 Schematic of FCN model
Decoder
Encoder (Backbone)
IndexIndexIndexIndexIndex
Pooling layersConvolutional layersDepool layers
Figure 4 Schematic of SegNet model
4 Mathematical Problems in Engineering
3 Material and Methods
)e semantic segmentation under a complex backgroundbased on the encoder-decoder network will establish anoptimized mathematical model with minimal segmentationtime Tsegminusmin segmentation time Tseg and accuracy PAUnder the encoder-decoder network the backbone network
ηmain the depth dmain and the decoder ηdecoder are obtainedto form an encoder By selecting the relatively better ηmainand ηdecoder of the basic network the component analysismodule to improve the optimized architecture is proposedand the encoder-decoder network with optimized PA forsemantic segmentation under a complex background isobtained In the encoder-decoder network the encoder
Decoder
Encoder (Backbone)
ConcatConcatConcatConcat
Pooling layersConvolutional layersDeconv layers
Figure 5 Schematic of U-Net model
PPM head
Fuse
Head
Head
Head
Fused feature map
Head
Material
Object
Part
Scene
No grad
Head
Texture(Testing)
orImage
(~450 times 720)
Image for texture(~48 times 48)
= Marbled
Feature Pyramid Network
Conv 3 times 3 GlobalAvgPooling Classifier
Scene head
Conv 3 times 3 Classifier
ObjectPartMaterial head
Conv 3 times 3 Conv 3 times 3 Classifier
Texture head
4x Conv (128 channels)
(Training)
Living room
14
18
116
132
14
14
18
116
132
Figure 6 Flowchart of UPerNet model
Mathematical Problems in Engineering 5
transforms color images (three 2D arrays) to 2048 2D arrays)e encoder is composed of convolutional layers andpooling layers and it could be trained on large-scale clas-sification datasets such as ImageNet to gain greater featureextraction capability
Modeling of semantic segmentation under a complexbackground using the encoder-decoder network and se-lection of ηmain and ηdecoder
)e encoder network is determined by the backbonenetwork ηmain depth dmain and decoder ηdecoder Segmen-tation time Tseg and accuracy PA depend on ηdecoder ηmainand dmain which can be expressed asPA(ηdecoder ηmain dmain) and Tseg(ηdecoder ηmain dmain)Denoting the minimal segmentation time as Tsegminusmin (therecommended value is 600ms) the mathematical model ofthe optimization for semantic segmentation under a com-plex background based on the encoder-decoder network isas follows
max PA ηdecoder ηmain dmain( 1113857
st Tseg ηdecoder ηmain dmain( 1113857leTsegminusmin
⎧⎨
⎩ (3)
)e parameters of the model to be optimized are dmainηmain and ηdecoder
First dmain ηmain and ηdecoder are combined )en theobject segmentation accuracy PAobj component segmen-tation accuracy PAcomp and Tseg are compared to select therelatively better ηmain and ηdecoder for the basic network
)e ADE20K dataset which has diverse annotations ofscenes objects parts of objects and parts of parts [26] isselected In this paper we denote parts of objects as com-ponent Using a GeForce GTX 1080Ti GPU and the trainingmethod described in [27] we obtained PAobj and PAcomp forimproved FCN [19] PSPNet [28] UPerNet [29] and othermajor encoder-decoder networks for semantic segmentationused in the ADE20K [26] objectcomponent segmentationdataset We evaluated Tseg of different network on theADE20K test set which consist of 3000 different resolutionimages with average image size of 13 million pixels Table 1displays the pixel accuracy and segmentation time of themain network architectures on ADE20K objectcomponentsegmentation tasks where the relatively better indices areindicated by a rectangular contour
From Table 1 the following observation can be made①In all networks PAcomp is less than PAobj by about 30 ②ηmain and dmain are equal in networks 1 2 and 3 PAcomp andPAobj are better in ηdecoder FPN + PPM compared toηdecoder FCN or ηdecoder PPM ③ηmain and ηdecoder areequal in networks 3 and 4 When dmain is doubled PAcompimproves slightly and Tseg improves significantly After acomprehensive consideration we selected the UPerNet [23]encoder-decoder network where ηmain ResNetdmain 50 and ηdecoder PPM + FPN
Figure 7 shows the architecture of semantic segmen-tation under a complex background implemented byUPerNet [29] )e encoder ResNet reduces the featuremap resolution by 12 at each stage )e resolution of the
output feature maps within five stages is respectively re-duced to 12 14 18 116 and 132 )e decoder isPPM+ FPN )rough pooling layers with different stridesthe feature maps are analyzed in a multiscale mannerwithin PPM )rough three transposed convolutionlayers the resolution of the feature maps is increased twotimes to 116 18 and 14 )e upsampling restores thefeature map resolution to 11 )e component analysismodule recognizes the feature map and outputs both theobjectcomponent segmentation results
Figure 8 shows the component analysis module ofUPerNet )e module is composed of the object classifiercomponent classifier and component analysis module)e input of each classifier is a 1 1 feature map )e objectclassifier implements the semantic recognition of NObjkinds of objects and outputs the object probability vectorpuvObj and the object label Cuv
Obj )e component classifierimplements the semantic recognition of NComp kinds ofcomponents and outputs the component probabilityvector puv
Comp and the component label CuvComp According
to CuvObj and the component object set CObjminusThings the
component analysis module only segments the CuvComp that
satisfies CuvObj isin CObjminusThings and outputs the valid component
label 1113954Cuv
Comp UPerNet outputs the object segmentation result(the object labelCuv
Obj) and the component segmentation result(the valid component label 1113954C
uv
Comp))e component analysis module of UPerNet can be
expressed as follows
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times C
uvComp C
uvObj isin CObjminusThings
0 times CuvComp C
uvObj notin CObjminusThings
⎧⎨
⎩
(4)
A greater PAcomp of 1113954Cuv
Comp leads to a higher componentsegmentation efficiency
Equation (4) outputs 1113954Cuv
Comp that satisfiesCuvObj isin CObjminusThings By identifying deviations of Cuv
Obj due tothe relationship between Cuv
Comp and CuvObj the optimized
component analysis module can improve the efficiency ofcomponent segmentation it both meets the requirement ofTsegminusmin and improves PAcomp
Improvements of UPerNet for semantic segmentationunder a complex background based on the componentanalysis module
In this subsection we describe the derivation of thecomponent analysis module the optimization of the func-tion expression of the module and the construction of thearchitecture of the component analysis module
As shown in Figure 8 the component classifier recog-nizes Ncomp component semantics and outputs the com-ponent labels Cuv
Comp of the pixel with image position (u v)
and the probability vector puvComp corresponding to the
various component labels )e relationship between CuvComp
and puvComp [31] is as follows
CuvComp argmaxkpCompminusk k 1 2 NComp (5)
From equation (4) and (5) we obtain
6 Mathematical Problems in Engineering
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times argmaxkpCompminusk C
uvObj isin CObjminusThings
0 times argmaxkpCompminusk CuvObj notin CObjminusThings
⎧⎨
⎩ k 1 2 NComp(6)
where pObjminusj is the probability of CuvComp Weighting pObjminusj
over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv
Obj isin CObjminusThings if
CuvComp notin C
CObjCompminusObj letting 1113954pCompminusk 0 can increase the
detection rate of background pixels )erefore the modulecan be expressed as follows
DecoderEncoder ηmain = ResNet
CS3 CS4 CS5CS1 CS2
Poolingpyramid module
PPM
Cup-1 Cup-2 Cup-3 Fusion
Object classifier
Pixel accuracyPA
Ups
ampl
e
Component segmentation
Objectsegmentation
Component classifier
Feature map pramid network FPN
Component analysis module
Componentanalysis model
Inputimage
ηdecoder = PPM + FPN
Segmentation time Tseg (ηmain dmain ηdecoder)
Depth dmain = 50
Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet
Componentclassifier
Validcomponent
label
Object classifier
Componentanalysis model1 1
Featuremap
Object label Cuv
Comp label Cuv
Valid object setℂObj-Things
Object segmentation
Componentsegmentation
CompfO (Cuv
CuvObj
Obj
Comp)CompC uv
Figure 8 Flowchart of component analysis module of UPerNet
Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices
Network Backboneηmain
Backbonedepth dmain
Decoderηdecoder
Object segmentationaccuracy () PAobj
Component segmentationaccuracy () PAcomp
Segmentation time(ms) Tseg
1 FCN [30] ResNet 50 FCN 7132 4081 333
2 PSPNet[28] ResNet 50 PPM 8004 4723 483
3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496
4 UPerNet[29] ResNet 101 PPM+FPN 8101 4871 604
Mathematical Problems in Engineering 7
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
proposed by Jonathan Long et al (University of CaliforniaBerkeley) in 2014 )e main idea is as follows the operationof a fully connected layer is equivalent to the convolution ofa feature map and a kernel function of identical size )efully connected layer is converted into a convolution layerwhich converts the CNN into a full convolution operationnetwork consisting of a complete convolution layer (con-volution operation) and pooling layer (convolution opera-tion) to process images of any resolution In this manner thelimitation of the fully connected layer is overcome ieimages with different resolutions can be processed )eoriginal resolution is restored after eight times bilinearupsampling by taking the pooling layer as an encoder de-signing a cross-layer superimposed architecture as a de-coder yielding the final output feature map of the networkby upsampling and adding to the output feature map of eachpooling layer (namely the encoder) to obtain a feature mapwith higher resolution )e CNN can perform end-to-endsemantic segmentation through a fully convolutional andcross-layer superimposed architecture therefore various
CNNs are capable of achieving end-to-end semantic seg-mentation Using the framework described the IoU reached622 in the VOC2012 semantic segmentation testing setwhich is 106 higher than the classic methods and 122(its IoU is 500) higher than the SDS [22] further seg-mented by CNN object detection and classical method
)e ResNet proposed by the Amazon Artificial Intelli-gence Laboratory serves as a basic network for constructingFCNs for semantic segmentation the IoU in VOC2012reaches 86 [23] )e prediction results of the FCN ap-plication are obtained by eight-fold bilinear interpolation ofthe feature map including the problems of detail losssmoothing of complex boundaries and poor detectionsensitivity of small objects)e results ignore the global scaleof the image possibly exhibiting regional discontinuity forlarge objects that exceed the receptive field Incorporatingfull connection and upsampling increase the size of thenetwork and introduces a large number of parameters to belearned
Figure 4 shows a schematic of the SegNet model which isan efficient real-time end-to-end semantic segmentationnetwork proposed by Alex Kendall et al (Cambridge Uni-versity) in 2015 )e idea is that the encoder and the decoderhave a one-to-one correspondence and the network appliesthe pooled index in the encoderrsquos maximum pooling toperform nonlinear upsampling thus forming a sparse featuremap then it performs convolution to generate a dense featuremap SegNet defines the basic network of the encoder-de-coder and deletes the fully connected layer to generate globalsemantic information )e decoder utilizes the encoder in-formation without training while the required amount oftraining parameters is 217 of that of the FCN For theprediction of the results SegNet and FCN occupy a GPUmemory of 1052 and 1806MB respectively and the GPUmemory occupancy on GPU GTX 980 (video memory4096MB) is 2568 and 4409 respectively )erefore theoccupancy of SegNet is 1841 lower than that of FCN In[20] the design of SegNet on ResNet was described and theIoU in VOC2012 reached 804 [24] )e IoU of SegNettested in VOC2012 was reported to be 599 and the
Object response maps
Componentsegmentation
Objectsegmentation
Valid component segmentation
Relation between component and object
Object regionand label
Component response maps
Component regionand label
Figure 1 Flowchart of semantic segmentation under the complex background using encoder-decoder network
Start
The encoder-decoder network transforms images to feature maps and the component classifier recognizes the
part response map and generates part segment results
The object classifier recognizes the object response map and generates object segmentation results with the
pixels of the foreground object extracted
End
The relationship between the component and objectis analyzed to improve the accuracy of part segmentation
Figure 2 Flowchart of the semantic segmentation under a complexbackground using the encoder-decoder network
Mathematical Problems in Engineering 3
efficiency was found to be 23 lower than that of FCNfurthermore there was the problem of false detection at theboundary
Figure 5 shows a schematic of the U-Net model whichwas proposed by Olaf Ronneberger (University of FreiburgGermany) in 2015 )e idea was to design a basic networkthat can be trained by semantic segmentation images andmodify the FCN cross-layer overlay architecture with thehigh-resolution feature map channels retained in theupsampling section and then connect it to the decoderoutput feature map in the third dimension Furthermore atiling strategy without limited by GPU memory was pro-posed with this strategy a seamless semantic segmentationof arbitrary high-resolution images was achieved WithU-Net a IoU of 920 and 776 was achieved in the
grayscale image semantic segmentation datasets PhC-U373and DIC-HeLa respectively )e skip connection was usedin the ResNet framework to improve U-Net and a IoU of827 was achieved in the VOC2012 [25] )ere are two keyproblems with the application of U-Net the basic networkneeds to be trained and it can only be applied to specifictask ie it has poor universality
Figure 6 shows a schematic of the UPerNet model whichwas proposed by Tete Xiao (Peking University China) in2018 In the UPerNet framework a feature pyramid network(FPN) with a pyramid pooling module (PPM) is appendedon the last layer of the backbone network before feeding itinto the top-down branch of the FPN Object and part headsare attached on the feature map and are fused by all thelayers put out by the FPN
Decoderskip connect
Encoder (Backbone)
Pooling layersConvolutional layers
2times upsample
2times upsample
8times upsample sum sum
Figure 3 Schematic of FCN model
Decoder
Encoder (Backbone)
IndexIndexIndexIndexIndex
Pooling layersConvolutional layersDepool layers
Figure 4 Schematic of SegNet model
4 Mathematical Problems in Engineering
3 Material and Methods
)e semantic segmentation under a complex backgroundbased on the encoder-decoder network will establish anoptimized mathematical model with minimal segmentationtime Tsegminusmin segmentation time Tseg and accuracy PAUnder the encoder-decoder network the backbone network
ηmain the depth dmain and the decoder ηdecoder are obtainedto form an encoder By selecting the relatively better ηmainand ηdecoder of the basic network the component analysismodule to improve the optimized architecture is proposedand the encoder-decoder network with optimized PA forsemantic segmentation under a complex background isobtained In the encoder-decoder network the encoder
Decoder
Encoder (Backbone)
ConcatConcatConcatConcat
Pooling layersConvolutional layersDeconv layers
Figure 5 Schematic of U-Net model
PPM head
Fuse
Head
Head
Head
Fused feature map
Head
Material
Object
Part
Scene
No grad
Head
Texture(Testing)
orImage
(~450 times 720)
Image for texture(~48 times 48)
= Marbled
Feature Pyramid Network
Conv 3 times 3 GlobalAvgPooling Classifier
Scene head
Conv 3 times 3 Classifier
ObjectPartMaterial head
Conv 3 times 3 Conv 3 times 3 Classifier
Texture head
4x Conv (128 channels)
(Training)
Living room
14
18
116
132
14
14
18
116
132
Figure 6 Flowchart of UPerNet model
Mathematical Problems in Engineering 5
transforms color images (three 2D arrays) to 2048 2D arrays)e encoder is composed of convolutional layers andpooling layers and it could be trained on large-scale clas-sification datasets such as ImageNet to gain greater featureextraction capability
Modeling of semantic segmentation under a complexbackground using the encoder-decoder network and se-lection of ηmain and ηdecoder
)e encoder network is determined by the backbonenetwork ηmain depth dmain and decoder ηdecoder Segmen-tation time Tseg and accuracy PA depend on ηdecoder ηmainand dmain which can be expressed asPA(ηdecoder ηmain dmain) and Tseg(ηdecoder ηmain dmain)Denoting the minimal segmentation time as Tsegminusmin (therecommended value is 600ms) the mathematical model ofthe optimization for semantic segmentation under a com-plex background based on the encoder-decoder network isas follows
max PA ηdecoder ηmain dmain( 1113857
st Tseg ηdecoder ηmain dmain( 1113857leTsegminusmin
⎧⎨
⎩ (3)
)e parameters of the model to be optimized are dmainηmain and ηdecoder
First dmain ηmain and ηdecoder are combined )en theobject segmentation accuracy PAobj component segmen-tation accuracy PAcomp and Tseg are compared to select therelatively better ηmain and ηdecoder for the basic network
)e ADE20K dataset which has diverse annotations ofscenes objects parts of objects and parts of parts [26] isselected In this paper we denote parts of objects as com-ponent Using a GeForce GTX 1080Ti GPU and the trainingmethod described in [27] we obtained PAobj and PAcomp forimproved FCN [19] PSPNet [28] UPerNet [29] and othermajor encoder-decoder networks for semantic segmentationused in the ADE20K [26] objectcomponent segmentationdataset We evaluated Tseg of different network on theADE20K test set which consist of 3000 different resolutionimages with average image size of 13 million pixels Table 1displays the pixel accuracy and segmentation time of themain network architectures on ADE20K objectcomponentsegmentation tasks where the relatively better indices areindicated by a rectangular contour
From Table 1 the following observation can be made①In all networks PAcomp is less than PAobj by about 30 ②ηmain and dmain are equal in networks 1 2 and 3 PAcomp andPAobj are better in ηdecoder FPN + PPM compared toηdecoder FCN or ηdecoder PPM ③ηmain and ηdecoder areequal in networks 3 and 4 When dmain is doubled PAcompimproves slightly and Tseg improves significantly After acomprehensive consideration we selected the UPerNet [23]encoder-decoder network where ηmain ResNetdmain 50 and ηdecoder PPM + FPN
Figure 7 shows the architecture of semantic segmen-tation under a complex background implemented byUPerNet [29] )e encoder ResNet reduces the featuremap resolution by 12 at each stage )e resolution of the
output feature maps within five stages is respectively re-duced to 12 14 18 116 and 132 )e decoder isPPM+ FPN )rough pooling layers with different stridesthe feature maps are analyzed in a multiscale mannerwithin PPM )rough three transposed convolutionlayers the resolution of the feature maps is increased twotimes to 116 18 and 14 )e upsampling restores thefeature map resolution to 11 )e component analysismodule recognizes the feature map and outputs both theobjectcomponent segmentation results
Figure 8 shows the component analysis module ofUPerNet )e module is composed of the object classifiercomponent classifier and component analysis module)e input of each classifier is a 1 1 feature map )e objectclassifier implements the semantic recognition of NObjkinds of objects and outputs the object probability vectorpuvObj and the object label Cuv
Obj )e component classifierimplements the semantic recognition of NComp kinds ofcomponents and outputs the component probabilityvector puv
Comp and the component label CuvComp According
to CuvObj and the component object set CObjminusThings the
component analysis module only segments the CuvComp that
satisfies CuvObj isin CObjminusThings and outputs the valid component
label 1113954Cuv
Comp UPerNet outputs the object segmentation result(the object labelCuv
Obj) and the component segmentation result(the valid component label 1113954C
uv
Comp))e component analysis module of UPerNet can be
expressed as follows
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times C
uvComp C
uvObj isin CObjminusThings
0 times CuvComp C
uvObj notin CObjminusThings
⎧⎨
⎩
(4)
A greater PAcomp of 1113954Cuv
Comp leads to a higher componentsegmentation efficiency
Equation (4) outputs 1113954Cuv
Comp that satisfiesCuvObj isin CObjminusThings By identifying deviations of Cuv
Obj due tothe relationship between Cuv
Comp and CuvObj the optimized
component analysis module can improve the efficiency ofcomponent segmentation it both meets the requirement ofTsegminusmin and improves PAcomp
Improvements of UPerNet for semantic segmentationunder a complex background based on the componentanalysis module
In this subsection we describe the derivation of thecomponent analysis module the optimization of the func-tion expression of the module and the construction of thearchitecture of the component analysis module
As shown in Figure 8 the component classifier recog-nizes Ncomp component semantics and outputs the com-ponent labels Cuv
Comp of the pixel with image position (u v)
and the probability vector puvComp corresponding to the
various component labels )e relationship between CuvComp
and puvComp [31] is as follows
CuvComp argmaxkpCompminusk k 1 2 NComp (5)
From equation (4) and (5) we obtain
6 Mathematical Problems in Engineering
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times argmaxkpCompminusk C
uvObj isin CObjminusThings
0 times argmaxkpCompminusk CuvObj notin CObjminusThings
⎧⎨
⎩ k 1 2 NComp(6)
where pObjminusj is the probability of CuvComp Weighting pObjminusj
over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv
Obj isin CObjminusThings if
CuvComp notin C
CObjCompminusObj letting 1113954pCompminusk 0 can increase the
detection rate of background pixels )erefore the modulecan be expressed as follows
DecoderEncoder ηmain = ResNet
CS3 CS4 CS5CS1 CS2
Poolingpyramid module
PPM
Cup-1 Cup-2 Cup-3 Fusion
Object classifier
Pixel accuracyPA
Ups
ampl
e
Component segmentation
Objectsegmentation
Component classifier
Feature map pramid network FPN
Component analysis module
Componentanalysis model
Inputimage
ηdecoder = PPM + FPN
Segmentation time Tseg (ηmain dmain ηdecoder)
Depth dmain = 50
Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet
Componentclassifier
Validcomponent
label
Object classifier
Componentanalysis model1 1
Featuremap
Object label Cuv
Comp label Cuv
Valid object setℂObj-Things
Object segmentation
Componentsegmentation
CompfO (Cuv
CuvObj
Obj
Comp)CompC uv
Figure 8 Flowchart of component analysis module of UPerNet
Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices
Network Backboneηmain
Backbonedepth dmain
Decoderηdecoder
Object segmentationaccuracy () PAobj
Component segmentationaccuracy () PAcomp
Segmentation time(ms) Tseg
1 FCN [30] ResNet 50 FCN 7132 4081 333
2 PSPNet[28] ResNet 50 PPM 8004 4723 483
3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496
4 UPerNet[29] ResNet 101 PPM+FPN 8101 4871 604
Mathematical Problems in Engineering 7
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
efficiency was found to be 23 lower than that of FCNfurthermore there was the problem of false detection at theboundary
Figure 5 shows a schematic of the U-Net model whichwas proposed by Olaf Ronneberger (University of FreiburgGermany) in 2015 )e idea was to design a basic networkthat can be trained by semantic segmentation images andmodify the FCN cross-layer overlay architecture with thehigh-resolution feature map channels retained in theupsampling section and then connect it to the decoderoutput feature map in the third dimension Furthermore atiling strategy without limited by GPU memory was pro-posed with this strategy a seamless semantic segmentationof arbitrary high-resolution images was achieved WithU-Net a IoU of 920 and 776 was achieved in the
grayscale image semantic segmentation datasets PhC-U373and DIC-HeLa respectively )e skip connection was usedin the ResNet framework to improve U-Net and a IoU of827 was achieved in the VOC2012 [25] )ere are two keyproblems with the application of U-Net the basic networkneeds to be trained and it can only be applied to specifictask ie it has poor universality
Figure 6 shows a schematic of the UPerNet model whichwas proposed by Tete Xiao (Peking University China) in2018 In the UPerNet framework a feature pyramid network(FPN) with a pyramid pooling module (PPM) is appendedon the last layer of the backbone network before feeding itinto the top-down branch of the FPN Object and part headsare attached on the feature map and are fused by all thelayers put out by the FPN
Decoderskip connect
Encoder (Backbone)
Pooling layersConvolutional layers
2times upsample
2times upsample
8times upsample sum sum
Figure 3 Schematic of FCN model
Decoder
Encoder (Backbone)
IndexIndexIndexIndexIndex
Pooling layersConvolutional layersDepool layers
Figure 4 Schematic of SegNet model
4 Mathematical Problems in Engineering
3 Material and Methods
)e semantic segmentation under a complex backgroundbased on the encoder-decoder network will establish anoptimized mathematical model with minimal segmentationtime Tsegminusmin segmentation time Tseg and accuracy PAUnder the encoder-decoder network the backbone network
ηmain the depth dmain and the decoder ηdecoder are obtainedto form an encoder By selecting the relatively better ηmainand ηdecoder of the basic network the component analysismodule to improve the optimized architecture is proposedand the encoder-decoder network with optimized PA forsemantic segmentation under a complex background isobtained In the encoder-decoder network the encoder
Decoder
Encoder (Backbone)
ConcatConcatConcatConcat
Pooling layersConvolutional layersDeconv layers
Figure 5 Schematic of U-Net model
PPM head
Fuse
Head
Head
Head
Fused feature map
Head
Material
Object
Part
Scene
No grad
Head
Texture(Testing)
orImage
(~450 times 720)
Image for texture(~48 times 48)
= Marbled
Feature Pyramid Network
Conv 3 times 3 GlobalAvgPooling Classifier
Scene head
Conv 3 times 3 Classifier
ObjectPartMaterial head
Conv 3 times 3 Conv 3 times 3 Classifier
Texture head
4x Conv (128 channels)
(Training)
Living room
14
18
116
132
14
14
18
116
132
Figure 6 Flowchart of UPerNet model
Mathematical Problems in Engineering 5
transforms color images (three 2D arrays) to 2048 2D arrays)e encoder is composed of convolutional layers andpooling layers and it could be trained on large-scale clas-sification datasets such as ImageNet to gain greater featureextraction capability
Modeling of semantic segmentation under a complexbackground using the encoder-decoder network and se-lection of ηmain and ηdecoder
)e encoder network is determined by the backbonenetwork ηmain depth dmain and decoder ηdecoder Segmen-tation time Tseg and accuracy PA depend on ηdecoder ηmainand dmain which can be expressed asPA(ηdecoder ηmain dmain) and Tseg(ηdecoder ηmain dmain)Denoting the minimal segmentation time as Tsegminusmin (therecommended value is 600ms) the mathematical model ofthe optimization for semantic segmentation under a com-plex background based on the encoder-decoder network isas follows
max PA ηdecoder ηmain dmain( 1113857
st Tseg ηdecoder ηmain dmain( 1113857leTsegminusmin
⎧⎨
⎩ (3)
)e parameters of the model to be optimized are dmainηmain and ηdecoder
First dmain ηmain and ηdecoder are combined )en theobject segmentation accuracy PAobj component segmen-tation accuracy PAcomp and Tseg are compared to select therelatively better ηmain and ηdecoder for the basic network
)e ADE20K dataset which has diverse annotations ofscenes objects parts of objects and parts of parts [26] isselected In this paper we denote parts of objects as com-ponent Using a GeForce GTX 1080Ti GPU and the trainingmethod described in [27] we obtained PAobj and PAcomp forimproved FCN [19] PSPNet [28] UPerNet [29] and othermajor encoder-decoder networks for semantic segmentationused in the ADE20K [26] objectcomponent segmentationdataset We evaluated Tseg of different network on theADE20K test set which consist of 3000 different resolutionimages with average image size of 13 million pixels Table 1displays the pixel accuracy and segmentation time of themain network architectures on ADE20K objectcomponentsegmentation tasks where the relatively better indices areindicated by a rectangular contour
From Table 1 the following observation can be made①In all networks PAcomp is less than PAobj by about 30 ②ηmain and dmain are equal in networks 1 2 and 3 PAcomp andPAobj are better in ηdecoder FPN + PPM compared toηdecoder FCN or ηdecoder PPM ③ηmain and ηdecoder areequal in networks 3 and 4 When dmain is doubled PAcompimproves slightly and Tseg improves significantly After acomprehensive consideration we selected the UPerNet [23]encoder-decoder network where ηmain ResNetdmain 50 and ηdecoder PPM + FPN
Figure 7 shows the architecture of semantic segmen-tation under a complex background implemented byUPerNet [29] )e encoder ResNet reduces the featuremap resolution by 12 at each stage )e resolution of the
output feature maps within five stages is respectively re-duced to 12 14 18 116 and 132 )e decoder isPPM+ FPN )rough pooling layers with different stridesthe feature maps are analyzed in a multiscale mannerwithin PPM )rough three transposed convolutionlayers the resolution of the feature maps is increased twotimes to 116 18 and 14 )e upsampling restores thefeature map resolution to 11 )e component analysismodule recognizes the feature map and outputs both theobjectcomponent segmentation results
Figure 8 shows the component analysis module ofUPerNet )e module is composed of the object classifiercomponent classifier and component analysis module)e input of each classifier is a 1 1 feature map )e objectclassifier implements the semantic recognition of NObjkinds of objects and outputs the object probability vectorpuvObj and the object label Cuv
Obj )e component classifierimplements the semantic recognition of NComp kinds ofcomponents and outputs the component probabilityvector puv
Comp and the component label CuvComp According
to CuvObj and the component object set CObjminusThings the
component analysis module only segments the CuvComp that
satisfies CuvObj isin CObjminusThings and outputs the valid component
label 1113954Cuv
Comp UPerNet outputs the object segmentation result(the object labelCuv
Obj) and the component segmentation result(the valid component label 1113954C
uv
Comp))e component analysis module of UPerNet can be
expressed as follows
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times C
uvComp C
uvObj isin CObjminusThings
0 times CuvComp C
uvObj notin CObjminusThings
⎧⎨
⎩
(4)
A greater PAcomp of 1113954Cuv
Comp leads to a higher componentsegmentation efficiency
Equation (4) outputs 1113954Cuv
Comp that satisfiesCuvObj isin CObjminusThings By identifying deviations of Cuv
Obj due tothe relationship between Cuv
Comp and CuvObj the optimized
component analysis module can improve the efficiency ofcomponent segmentation it both meets the requirement ofTsegminusmin and improves PAcomp
Improvements of UPerNet for semantic segmentationunder a complex background based on the componentanalysis module
In this subsection we describe the derivation of thecomponent analysis module the optimization of the func-tion expression of the module and the construction of thearchitecture of the component analysis module
As shown in Figure 8 the component classifier recog-nizes Ncomp component semantics and outputs the com-ponent labels Cuv
Comp of the pixel with image position (u v)
and the probability vector puvComp corresponding to the
various component labels )e relationship between CuvComp
and puvComp [31] is as follows
CuvComp argmaxkpCompminusk k 1 2 NComp (5)
From equation (4) and (5) we obtain
6 Mathematical Problems in Engineering
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times argmaxkpCompminusk C
uvObj isin CObjminusThings
0 times argmaxkpCompminusk CuvObj notin CObjminusThings
⎧⎨
⎩ k 1 2 NComp(6)
where pObjminusj is the probability of CuvComp Weighting pObjminusj
over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv
Obj isin CObjminusThings if
CuvComp notin C
CObjCompminusObj letting 1113954pCompminusk 0 can increase the
detection rate of background pixels )erefore the modulecan be expressed as follows
DecoderEncoder ηmain = ResNet
CS3 CS4 CS5CS1 CS2
Poolingpyramid module
PPM
Cup-1 Cup-2 Cup-3 Fusion
Object classifier
Pixel accuracyPA
Ups
ampl
e
Component segmentation
Objectsegmentation
Component classifier
Feature map pramid network FPN
Component analysis module
Componentanalysis model
Inputimage
ηdecoder = PPM + FPN
Segmentation time Tseg (ηmain dmain ηdecoder)
Depth dmain = 50
Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet
Componentclassifier
Validcomponent
label
Object classifier
Componentanalysis model1 1
Featuremap
Object label Cuv
Comp label Cuv
Valid object setℂObj-Things
Object segmentation
Componentsegmentation
CompfO (Cuv
CuvObj
Obj
Comp)CompC uv
Figure 8 Flowchart of component analysis module of UPerNet
Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices
Network Backboneηmain
Backbonedepth dmain
Decoderηdecoder
Object segmentationaccuracy () PAobj
Component segmentationaccuracy () PAcomp
Segmentation time(ms) Tseg
1 FCN [30] ResNet 50 FCN 7132 4081 333
2 PSPNet[28] ResNet 50 PPM 8004 4723 483
3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496
4 UPerNet[29] ResNet 101 PPM+FPN 8101 4871 604
Mathematical Problems in Engineering 7
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
3 Material and Methods
)e semantic segmentation under a complex backgroundbased on the encoder-decoder network will establish anoptimized mathematical model with minimal segmentationtime Tsegminusmin segmentation time Tseg and accuracy PAUnder the encoder-decoder network the backbone network
ηmain the depth dmain and the decoder ηdecoder are obtainedto form an encoder By selecting the relatively better ηmainand ηdecoder of the basic network the component analysismodule to improve the optimized architecture is proposedand the encoder-decoder network with optimized PA forsemantic segmentation under a complex background isobtained In the encoder-decoder network the encoder
Decoder
Encoder (Backbone)
ConcatConcatConcatConcat
Pooling layersConvolutional layersDeconv layers
Figure 5 Schematic of U-Net model
PPM head
Fuse
Head
Head
Head
Fused feature map
Head
Material
Object
Part
Scene
No grad
Head
Texture(Testing)
orImage
(~450 times 720)
Image for texture(~48 times 48)
= Marbled
Feature Pyramid Network
Conv 3 times 3 GlobalAvgPooling Classifier
Scene head
Conv 3 times 3 Classifier
ObjectPartMaterial head
Conv 3 times 3 Conv 3 times 3 Classifier
Texture head
4x Conv (128 channels)
(Training)
Living room
14
18
116
132
14
14
18
116
132
Figure 6 Flowchart of UPerNet model
Mathematical Problems in Engineering 5
transforms color images (three 2D arrays) to 2048 2D arrays)e encoder is composed of convolutional layers andpooling layers and it could be trained on large-scale clas-sification datasets such as ImageNet to gain greater featureextraction capability
Modeling of semantic segmentation under a complexbackground using the encoder-decoder network and se-lection of ηmain and ηdecoder
)e encoder network is determined by the backbonenetwork ηmain depth dmain and decoder ηdecoder Segmen-tation time Tseg and accuracy PA depend on ηdecoder ηmainand dmain which can be expressed asPA(ηdecoder ηmain dmain) and Tseg(ηdecoder ηmain dmain)Denoting the minimal segmentation time as Tsegminusmin (therecommended value is 600ms) the mathematical model ofthe optimization for semantic segmentation under a com-plex background based on the encoder-decoder network isas follows
max PA ηdecoder ηmain dmain( 1113857
st Tseg ηdecoder ηmain dmain( 1113857leTsegminusmin
⎧⎨
⎩ (3)
)e parameters of the model to be optimized are dmainηmain and ηdecoder
First dmain ηmain and ηdecoder are combined )en theobject segmentation accuracy PAobj component segmen-tation accuracy PAcomp and Tseg are compared to select therelatively better ηmain and ηdecoder for the basic network
)e ADE20K dataset which has diverse annotations ofscenes objects parts of objects and parts of parts [26] isselected In this paper we denote parts of objects as com-ponent Using a GeForce GTX 1080Ti GPU and the trainingmethod described in [27] we obtained PAobj and PAcomp forimproved FCN [19] PSPNet [28] UPerNet [29] and othermajor encoder-decoder networks for semantic segmentationused in the ADE20K [26] objectcomponent segmentationdataset We evaluated Tseg of different network on theADE20K test set which consist of 3000 different resolutionimages with average image size of 13 million pixels Table 1displays the pixel accuracy and segmentation time of themain network architectures on ADE20K objectcomponentsegmentation tasks where the relatively better indices areindicated by a rectangular contour
From Table 1 the following observation can be made①In all networks PAcomp is less than PAobj by about 30 ②ηmain and dmain are equal in networks 1 2 and 3 PAcomp andPAobj are better in ηdecoder FPN + PPM compared toηdecoder FCN or ηdecoder PPM ③ηmain and ηdecoder areequal in networks 3 and 4 When dmain is doubled PAcompimproves slightly and Tseg improves significantly After acomprehensive consideration we selected the UPerNet [23]encoder-decoder network where ηmain ResNetdmain 50 and ηdecoder PPM + FPN
Figure 7 shows the architecture of semantic segmen-tation under a complex background implemented byUPerNet [29] )e encoder ResNet reduces the featuremap resolution by 12 at each stage )e resolution of the
output feature maps within five stages is respectively re-duced to 12 14 18 116 and 132 )e decoder isPPM+ FPN )rough pooling layers with different stridesthe feature maps are analyzed in a multiscale mannerwithin PPM )rough three transposed convolutionlayers the resolution of the feature maps is increased twotimes to 116 18 and 14 )e upsampling restores thefeature map resolution to 11 )e component analysismodule recognizes the feature map and outputs both theobjectcomponent segmentation results
Figure 8 shows the component analysis module ofUPerNet )e module is composed of the object classifiercomponent classifier and component analysis module)e input of each classifier is a 1 1 feature map )e objectclassifier implements the semantic recognition of NObjkinds of objects and outputs the object probability vectorpuvObj and the object label Cuv
Obj )e component classifierimplements the semantic recognition of NComp kinds ofcomponents and outputs the component probabilityvector puv
Comp and the component label CuvComp According
to CuvObj and the component object set CObjminusThings the
component analysis module only segments the CuvComp that
satisfies CuvObj isin CObjminusThings and outputs the valid component
label 1113954Cuv
Comp UPerNet outputs the object segmentation result(the object labelCuv
Obj) and the component segmentation result(the valid component label 1113954C
uv
Comp))e component analysis module of UPerNet can be
expressed as follows
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times C
uvComp C
uvObj isin CObjminusThings
0 times CuvComp C
uvObj notin CObjminusThings
⎧⎨
⎩
(4)
A greater PAcomp of 1113954Cuv
Comp leads to a higher componentsegmentation efficiency
Equation (4) outputs 1113954Cuv
Comp that satisfiesCuvObj isin CObjminusThings By identifying deviations of Cuv
Obj due tothe relationship between Cuv
Comp and CuvObj the optimized
component analysis module can improve the efficiency ofcomponent segmentation it both meets the requirement ofTsegminusmin and improves PAcomp
Improvements of UPerNet for semantic segmentationunder a complex background based on the componentanalysis module
In this subsection we describe the derivation of thecomponent analysis module the optimization of the func-tion expression of the module and the construction of thearchitecture of the component analysis module
As shown in Figure 8 the component classifier recog-nizes Ncomp component semantics and outputs the com-ponent labels Cuv
Comp of the pixel with image position (u v)
and the probability vector puvComp corresponding to the
various component labels )e relationship between CuvComp
and puvComp [31] is as follows
CuvComp argmaxkpCompminusk k 1 2 NComp (5)
From equation (4) and (5) we obtain
6 Mathematical Problems in Engineering
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times argmaxkpCompminusk C
uvObj isin CObjminusThings
0 times argmaxkpCompminusk CuvObj notin CObjminusThings
⎧⎨
⎩ k 1 2 NComp(6)
where pObjminusj is the probability of CuvComp Weighting pObjminusj
over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv
Obj isin CObjminusThings if
CuvComp notin C
CObjCompminusObj letting 1113954pCompminusk 0 can increase the
detection rate of background pixels )erefore the modulecan be expressed as follows
DecoderEncoder ηmain = ResNet
CS3 CS4 CS5CS1 CS2
Poolingpyramid module
PPM
Cup-1 Cup-2 Cup-3 Fusion
Object classifier
Pixel accuracyPA
Ups
ampl
e
Component segmentation
Objectsegmentation
Component classifier
Feature map pramid network FPN
Component analysis module
Componentanalysis model
Inputimage
ηdecoder = PPM + FPN
Segmentation time Tseg (ηmain dmain ηdecoder)
Depth dmain = 50
Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet
Componentclassifier
Validcomponent
label
Object classifier
Componentanalysis model1 1
Featuremap
Object label Cuv
Comp label Cuv
Valid object setℂObj-Things
Object segmentation
Componentsegmentation
CompfO (Cuv
CuvObj
Obj
Comp)CompC uv
Figure 8 Flowchart of component analysis module of UPerNet
Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices
Network Backboneηmain
Backbonedepth dmain
Decoderηdecoder
Object segmentationaccuracy () PAobj
Component segmentationaccuracy () PAcomp
Segmentation time(ms) Tseg
1 FCN [30] ResNet 50 FCN 7132 4081 333
2 PSPNet[28] ResNet 50 PPM 8004 4723 483
3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496
4 UPerNet[29] ResNet 101 PPM+FPN 8101 4871 604
Mathematical Problems in Engineering 7
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
transforms color images (three 2D arrays) to 2048 2D arrays)e encoder is composed of convolutional layers andpooling layers and it could be trained on large-scale clas-sification datasets such as ImageNet to gain greater featureextraction capability
Modeling of semantic segmentation under a complexbackground using the encoder-decoder network and se-lection of ηmain and ηdecoder
)e encoder network is determined by the backbonenetwork ηmain depth dmain and decoder ηdecoder Segmen-tation time Tseg and accuracy PA depend on ηdecoder ηmainand dmain which can be expressed asPA(ηdecoder ηmain dmain) and Tseg(ηdecoder ηmain dmain)Denoting the minimal segmentation time as Tsegminusmin (therecommended value is 600ms) the mathematical model ofthe optimization for semantic segmentation under a com-plex background based on the encoder-decoder network isas follows
max PA ηdecoder ηmain dmain( 1113857
st Tseg ηdecoder ηmain dmain( 1113857leTsegminusmin
⎧⎨
⎩ (3)
)e parameters of the model to be optimized are dmainηmain and ηdecoder
First dmain ηmain and ηdecoder are combined )en theobject segmentation accuracy PAobj component segmen-tation accuracy PAcomp and Tseg are compared to select therelatively better ηmain and ηdecoder for the basic network
)e ADE20K dataset which has diverse annotations ofscenes objects parts of objects and parts of parts [26] isselected In this paper we denote parts of objects as com-ponent Using a GeForce GTX 1080Ti GPU and the trainingmethod described in [27] we obtained PAobj and PAcomp forimproved FCN [19] PSPNet [28] UPerNet [29] and othermajor encoder-decoder networks for semantic segmentationused in the ADE20K [26] objectcomponent segmentationdataset We evaluated Tseg of different network on theADE20K test set which consist of 3000 different resolutionimages with average image size of 13 million pixels Table 1displays the pixel accuracy and segmentation time of themain network architectures on ADE20K objectcomponentsegmentation tasks where the relatively better indices areindicated by a rectangular contour
From Table 1 the following observation can be made①In all networks PAcomp is less than PAobj by about 30 ②ηmain and dmain are equal in networks 1 2 and 3 PAcomp andPAobj are better in ηdecoder FPN + PPM compared toηdecoder FCN or ηdecoder PPM ③ηmain and ηdecoder areequal in networks 3 and 4 When dmain is doubled PAcompimproves slightly and Tseg improves significantly After acomprehensive consideration we selected the UPerNet [23]encoder-decoder network where ηmain ResNetdmain 50 and ηdecoder PPM + FPN
Figure 7 shows the architecture of semantic segmen-tation under a complex background implemented byUPerNet [29] )e encoder ResNet reduces the featuremap resolution by 12 at each stage )e resolution of the
output feature maps within five stages is respectively re-duced to 12 14 18 116 and 132 )e decoder isPPM+ FPN )rough pooling layers with different stridesthe feature maps are analyzed in a multiscale mannerwithin PPM )rough three transposed convolutionlayers the resolution of the feature maps is increased twotimes to 116 18 and 14 )e upsampling restores thefeature map resolution to 11 )e component analysismodule recognizes the feature map and outputs both theobjectcomponent segmentation results
Figure 8 shows the component analysis module ofUPerNet )e module is composed of the object classifiercomponent classifier and component analysis module)e input of each classifier is a 1 1 feature map )e objectclassifier implements the semantic recognition of NObjkinds of objects and outputs the object probability vectorpuvObj and the object label Cuv
Obj )e component classifierimplements the semantic recognition of NComp kinds ofcomponents and outputs the component probabilityvector puv
Comp and the component label CuvComp According
to CuvObj and the component object set CObjminusThings the
component analysis module only segments the CuvComp that
satisfies CuvObj isin CObjminusThings and outputs the valid component
label 1113954Cuv
Comp UPerNet outputs the object segmentation result(the object labelCuv
Obj) and the component segmentation result(the valid component label 1113954C
uv
Comp))e component analysis module of UPerNet can be
expressed as follows
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times C
uvComp C
uvObj isin CObjminusThings
0 times CuvComp C
uvObj notin CObjminusThings
⎧⎨
⎩
(4)
A greater PAcomp of 1113954Cuv
Comp leads to a higher componentsegmentation efficiency
Equation (4) outputs 1113954Cuv
Comp that satisfiesCuvObj isin CObjminusThings By identifying deviations of Cuv
Obj due tothe relationship between Cuv
Comp and CuvObj the optimized
component analysis module can improve the efficiency ofcomponent segmentation it both meets the requirement ofTsegminusmin and improves PAcomp
Improvements of UPerNet for semantic segmentationunder a complex background based on the componentanalysis module
In this subsection we describe the derivation of thecomponent analysis module the optimization of the func-tion expression of the module and the construction of thearchitecture of the component analysis module
As shown in Figure 8 the component classifier recog-nizes Ncomp component semantics and outputs the com-ponent labels Cuv
Comp of the pixel with image position (u v)
and the probability vector puvComp corresponding to the
various component labels )e relationship between CuvComp
and puvComp [31] is as follows
CuvComp argmaxkpCompminusk k 1 2 NComp (5)
From equation (4) and (5) we obtain
6 Mathematical Problems in Engineering
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times argmaxkpCompminusk C
uvObj isin CObjminusThings
0 times argmaxkpCompminusk CuvObj notin CObjminusThings
⎧⎨
⎩ k 1 2 NComp(6)
where pObjminusj is the probability of CuvComp Weighting pObjminusj
over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv
Obj isin CObjminusThings if
CuvComp notin C
CObjCompminusObj letting 1113954pCompminusk 0 can increase the
detection rate of background pixels )erefore the modulecan be expressed as follows
DecoderEncoder ηmain = ResNet
CS3 CS4 CS5CS1 CS2
Poolingpyramid module
PPM
Cup-1 Cup-2 Cup-3 Fusion
Object classifier
Pixel accuracyPA
Ups
ampl
e
Component segmentation
Objectsegmentation
Component classifier
Feature map pramid network FPN
Component analysis module
Componentanalysis model
Inputimage
ηdecoder = PPM + FPN
Segmentation time Tseg (ηmain dmain ηdecoder)
Depth dmain = 50
Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet
Componentclassifier
Validcomponent
label
Object classifier
Componentanalysis model1 1
Featuremap
Object label Cuv
Comp label Cuv
Valid object setℂObj-Things
Object segmentation
Componentsegmentation
CompfO (Cuv
CuvObj
Obj
Comp)CompC uv
Figure 8 Flowchart of component analysis module of UPerNet
Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices
Network Backboneηmain
Backbonedepth dmain
Decoderηdecoder
Object segmentationaccuracy () PAobj
Component segmentationaccuracy () PAcomp
Segmentation time(ms) Tseg
1 FCN [30] ResNet 50 FCN 7132 4081 333
2 PSPNet[28] ResNet 50 PPM 8004 4723 483
3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496
4 UPerNet[29] ResNet 101 PPM+FPN 8101 4871 604
Mathematical Problems in Engineering 7
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThings1113872 1113873
1 times argmaxkpCompminusk C
uvObj isin CObjminusThings
0 times argmaxkpCompminusk CuvObj notin CObjminusThings
⎧⎨
⎩ k 1 2 NComp(6)
where pObjminusj is the probability of CuvComp Weighting pObjminusj
over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv
Obj isin CObjminusThings if
CuvComp notin C
CObjCompminusObj letting 1113954pCompminusk 0 can increase the
detection rate of background pixels )erefore the modulecan be expressed as follows
DecoderEncoder ηmain = ResNet
CS3 CS4 CS5CS1 CS2
Poolingpyramid module
PPM
Cup-1 Cup-2 Cup-3 Fusion
Object classifier
Pixel accuracyPA
Ups
ampl
e
Component segmentation
Objectsegmentation
Component classifier
Feature map pramid network FPN
Component analysis module
Componentanalysis model
Inputimage
ηdecoder = PPM + FPN
Segmentation time Tseg (ηmain dmain ηdecoder)
Depth dmain = 50
Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet
Componentclassifier
Validcomponent
label
Object classifier
Componentanalysis model1 1
Featuremap
Object label Cuv
Comp label Cuv
Valid object setℂObj-Things
Object segmentation
Componentsegmentation
CompfO (Cuv
CuvObj
Obj
Comp)CompC uv
Figure 8 Flowchart of component analysis module of UPerNet
Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices
Network Backboneηmain
Backbonedepth dmain
Decoderηdecoder
Object segmentationaccuracy () PAobj
Component segmentationaccuracy () PAcomp
Segmentation time(ms) Tseg
1 FCN [30] ResNet 50 FCN 7132 4081 333
2 PSPNet[28] ResNet 50 PPM 8004 4723 483
3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496
4 UPerNet[29] ResNet 101 PPM+FPN 8101 4871 604
Mathematical Problems in Engineering 7
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
1113954Cuv
Comp fOp CuvObj C
uvCompCObjminusThingsC
CObj
CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp
1113954pCompminusk
1113936
jisinCObjminusThingsandkisinCj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp
1 minus 1113936
jnotinCObjminusThingsork notin Cj
CompminusObj
j1pObjminusj
⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎩
j 1 2 Nobj
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(7)
which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj
)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk by
considering CuvComp notin C
CObjCompminusObj and by both replacing 1 times
argmaxkpCompminusk with argmaxk1113954pCompminusk and considering
CuvComp notin C
CObjCompminusObj in the component analysis module
respectively
31 Experimental Results
311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part
and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU
Table 2 reports PA1113955Partand Tseg of the UPerNet obtained
with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made
(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively
)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part
and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part
and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)
312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of
the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module
Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod
)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module
From Table 5 it can be seen that the proposed methodimproved PA1113955Part
from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection
8 Mathematical Problems in Engineering
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Comp responsepuv
Component segmentation
Validcomplabel
Validcomp
response
Comp analysismodel
fOp (Cuv Cuv
Obj
Obj
Obj
Comp) puv Comp C uv
Comp
Object label Cuv
Object responsepuv
Valid object setℂObj-Things
Comp
(a)
11Feature
map
Object classifier
Componentclassifier
Object segmentation
Componentsegmentation
Relation between comp and obj
ℂCObjComp-Obj
ObjObject label Cuv
CompComp label Cuv
Validcomplabel
Comp analysismodel
Valid object setℂObj-Things
fO (Cuv Cuv
Obj Comp) C uv Comp
(b)
Relation between comp and obj
Object segmentation
11Feature
map
Object classifier
Componentclassifier
Componentsegmentation
Comp response Validcomplabel
Validcomp
response
Object responsepuv
Valid object setℂObj-Things
Comp analysismodel
fOp (Cuv Cuv
Obj Comp) p uv Comp C uv
Comp
ObjObject label Cuv
Obj
Comppuv
ℂCObjComp-Obj
(c)
Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize
the module (b) Analyze CuvComp notin C
CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk
1113954pCompminusk and analyzeCuvComp notin C
CObjCompminusObj to optimize the module
Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726
Mathematical Problems in Engineering 9
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
UPerNet UPerNet with the component analysis module
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCarCar Car CarCar Car
CarCarCarCar CarCar Car
Car
Car
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar Car Car CarCar
Car
CarCar
PersonPerson
CarCar
CarCar
Person
PersonPerson
PersonCarCar
CarPerson
PersonPerson
Person
Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet
Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask
Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238
Table 2 Continued
Backboneηmain
Backbonedepth dmain
Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part
()Segmentation time
Tseg (ms)
7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492
8 ResNet 50 PPM+FPN+CAM CuvComp notin C
CObj
CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj 5562 496
Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task
Method AP () AP050 () Segmentation time (ms)
SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module
10 Mathematical Problems in Engineering
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
4 Conclusions
In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study
(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that
ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder
(ii) We replaced 1 times argmaxkpCompminusk withargmaxk
1113954pCompminusk )e component analysis module
of CuvComp notin C
CObjCompminusObj and UPerNet are considered
to improve the performance of the encoder-decodernetwork
(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part
and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
DenominationPattern
Safeline SafelineChinese
ChineseChinese
ChineseChinese
Denomination
Denomination
Watermark
WatermarkWatermark
PatternVersionDenomination
Serial number
Pattern
Watermark
Serial numberWatermark Watermark
SafelineChinese
OVMI
Safeline
OVMI
Pattern
PatternDenomination
Serial number
Denomination
Watermark
Watermark Watermark
SafelineSafeline
Watermark
Watermark Watermark
SafelineSafeline
Watermark
WatermarkWatermark
SafelineSafeline
UPerNet UPerNet with the component analysis module
Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module
Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection
Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)
1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk
1113954pCompminusk + CuvComp notin C
CObjCompminusObj
9529 100 496
Mathematical Problems in Engineering 11
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part
from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection
)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk
1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications
Data Availability
)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch
Conflicts of Interest
)e authors declare that there are no conflicts of interestregarding the publication of this paper
Acknowledgments
)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)
References
[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015
[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016
[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020
[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019
[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018
[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo
American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019
[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019
[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020
[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese
[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019
[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018
[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese
[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019
[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019
[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese
[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014
[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012
[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015
[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015
[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017
[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015
[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014
[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018
12 Mathematical Problems in Engineering
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13
[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019
[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017
[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019
[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587
[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017
[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018
[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018
[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017
[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016
Mathematical Problems in Engineering 13