SemanticSegmentationunderaComplexBackgroundforMachine ... · 2020. 5. 2. · analysis module to...

Research ArticleSemanticSegmentationunderaComplexBackgroundforMachineVision Detection Based on Modified UPerNet with ComponentAnalysis Modules

Jian Huang Guixiong Liu and Bodi Wang

School of Mechanical and Automotive Engineering South China University of Technology Guangzhou 510640 China

Correspondence should be addressed to Guixiong Liu megxliuscuteducn

Received 2 May 2020 Revised 31 July 2020 Accepted 17 August 2020 Published 12 September 2020

Academic Editor Yang Li

Copyright copy 2020 Jian Huang et al )is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Semantic segmentation with convolutional neural networks under a complex background using the encoder-decoder networkincreases the overall performance of online machine vision detection and identification To maximize the accuracy of semanticsegmentation under a complex background it is necessary to consider the semantic response values of objects and componentsand their mutually exclusive relationship In this study we attempt to improve the low accuracy of component segmentation )ebasic network of the encoder is selected for the semantic segmentation and the UPerNet is modified based on the componentanalysis module )e experimental results show that the accuracy of the proposed method improves from 4889 to 5562 andthe segmentation time decreases from 721 to 496ms )e method also shows good performance in vision-based detection of 2019Chinese Yuan features

1 Introduction

As one of the primary tasks of machine vision semanticsegmentation differs from image classification and objectdetection )e image classification process involved therecognition of the the type of object but cannot provideposition information [1] whereas object detection can beused to detect the boundary and type of the object but cannotprovide the actual boundary information [2] On the otherhand semantic segmentation can recognize the type of theobject and divide the actual area at the pixel level as well asimplement certain machine vision detection functions suchas positioning and recognition [3] As we start from imageclassification move to object detection and finally reachsemantic segmentation the accuracy of the output range andposition information improves [4] In the same manner therecognition precision increases from the image-level to thepixel-level Semantic segmentation achieves the best rec-ognition accuracy therefore it is useful in (1) distinguishingthe entity from the background (2) obtaining the positioninformation (centroid) clearly physically defined by indirect

calculation and (3) performing machine vision detectionand identification organization which require high spatialresolution and reliability [5 6]

)e online semantic segmentation with convolutionalneural networks (CNNs) under a complex background iseffective for improving the overall performance of onlinemachine vision detection and identification [7] whenmaintaining the same architecture of the encoder-decodernetwork and convolutional and pooling layer and equiv-alently transforming the fully connected layer thusyielding broad generalization In recent years ResNet hasbeen used to replace the shallow CNN to optimize semanticsegmentation results significantly [8] For machine visiondetection and identification under a random-texturecomplex background it is necessary to eliminate therandom-texture complex background to extract the objectwithout affecting the original features of the object [9] )edifficulty lies in the randomness of the textured back-ground which makes it difficult to employ typical periodictexture elimination techniques such as the frequencydomain filtering and image matrix methods [10 11] On the

HindawiMathematical Problems in EngineeringVolume 2020 Article ID 6903130 13 pageshttpsdoiorg10115520206903130

contrary the encoder-decoder semantic segmentationnetwork ultimately retains the classification components inthe network backbone thus exhibiting larger receptivefields and better pixel recognition ability [12 13] asdepicted in Figure 1 Unreasonably selected and conse-quently incorrectly used component analysis modules willlead to an excessively small foreground range resulting inthe misjudgment of component pixels If the componentanalysis module is too sensitive the foreground range willbe too broad thus it would be difficult to remove mis-judged pixels [14] )erefore in the process of semanticsegmentation under the complex background it is nec-essary to consider objects the contradiction between thecomponent semantic response values and their mutualexclusion relationship while maximizing the accuracy ofthe semantic segmentation under the complex backgroundusing the encoder-decoder network

Figure 2 shows a flowchart of the semantic segmen-tation under the complex background using the encoder-decoder network )e process can be described as followsthe component classifier of the encoder-decoder networkrecognizes the pixel-level semantics and response of thepixels in the image the object classifier recognizes thepixel-level object semantics and the response and extractsmisjudged pixels of the foreground object in semanticsegmentation finally the mutually exclusive relationshipbetween component semantics and object semantics isconsidered and non-background-independent semanticsare determined to achieve effective semantic segmentationunder a complex background to improve the model ac-curacy [15]

In this study we focus on online semantic segmentationunder a complex background using the encoder-decodernetwork to solve the above described mutual exclusionrelationship problem between component semantics andobject semantics )e main contributions of this study arethreefold

(i) We attempted to improve the low accuracy ofcomponent segmentation and selected the superiorbasic encoder-decoder network according to theperformance

(ii) We modified the UPerNet based on the componentanalysis module to maximize the accuracy of thesemantic segmentation under a complex back-ground using the encoder-decoder network whilemaintaining an appropriate segmentation time

(iii) We show that the proposed method is superior toprevious encoder-decoder network and has satis-factory accuracy and segmentation time We alsoshow the application of the proposedmethod in bill-note anticounterfeiting identification

)e rest of this paper is organized as follows In Section1 we outline related works In Section 2 we introduce amethod for semantic segmentation under a complexbackground using the encoder-decoder network In Section3 we verify the proposed method In Section 4 we presentthe conclusions

2 Related Work

21 Evaluation of the Semantic Segmentation PerformanceWe can generally evaluate the CNN semantic segmentationperformance from the accuracy and running speed )eaccuracy indicators usually include the pixel accuracy [16]mean intersection over union [16] and mean averageprecision [17] )e pixel accuracy PA is defined as thenumber of pixels segmented correctly accounting for thetotal number of image pixels the mean intersection overunion IoU is defined as the degree of coincidence betweenthe segmentation results and their ground-truth the meanaverage precision APIoUT is the mean of average precisionscores for segmentation results whose intersection overunion no less than IoUT for each classes

If the object detected by machine vision has k categoriesthe semantic segmentation model requires the label of thek + 1 categories denoted as L l0 l1 lk1113864 1113865 including thebackground Denoting the number of pixels of li mis-rec-ognized as the pixel of lj and li(ine j) as pij and pii re-spectively the numbers of detected objects of li mis-recognized as lj and li(ine j) as Nij and Nii respectively thepixel accuracy can be calculated as follows

PA 1113944

k

i0pii

1113944k

i01113944k

j0pij

IoU 1113944K

i0pii

1113944k

j0pij + 1113944k

j0pji minus pii

(1)

APIoUT 1113944

k

i0Nii

1113944k

i01113944k

j0Nij

(2)

)e running speed of CNN semantic segmentation canbe measured by indicators including the segmentation timeTseg [18] which is defined as the time needed to segment theimage by running the algorithm )e theoretically shortestpossible time required to segment the image is also labeled asthe theoretical segmentation time Tsegminust and the time re-quired for the algorithm to actually segment the image isknown as the actual segmentation time Tsegminusa If not oth-erwise specified Tsegminusa is denoted as Tseg

22 End-To-End Encoder-Decoder Semantic SegmentationFramework Although CNN semantic segmentation per-forms as a single-step end-to-end process which is notfurther divided into multiple modules to deal with theconnection of numerous modules directly affects the CNN)e end-to-end semantic segmentation framework using theencoder-decoder enables the CNN to detect images with anyresolution and output prediction map results with constantresolution Typical networks include fully convolutionalnetworks (FCN) [19] SegNet [20] and U-Net [21]

Figure 3 shows a schematic of the FCN model )e FCNis an end-to-end semantic segmentation framework

2 Mathematical Problems in Engineering

proposed by Jonathan Long et al (University of CaliforniaBerkeley) in 2014 )e main idea is as follows the operationof a fully connected layer is equivalent to the convolution ofa feature map and a kernel function of identical size )efully connected layer is converted into a convolution layerwhich converts the CNN into a full convolution operationnetwork consisting of a complete convolution layer (con-volution operation) and pooling layer (convolution opera-tion) to process images of any resolution In this manner thelimitation of the fully connected layer is overcome ieimages with different resolutions can be processed )eoriginal resolution is restored after eight times bilinearupsampling by taking the pooling layer as an encoder de-signing a cross-layer superimposed architecture as a de-coder yielding the final output feature map of the networkby upsampling and adding to the output feature map of eachpooling layer (namely the encoder) to obtain a feature mapwith higher resolution )e CNN can perform end-to-endsemantic segmentation through a fully convolutional andcross-layer superimposed architecture therefore various

CNNs are capable of achieving end-to-end semantic seg-mentation Using the framework described the IoU reached622 in the VOC2012 semantic segmentation testing setwhich is 106 higher than the classic methods and 122(its IoU is 500) higher than the SDS [22] further seg-mented by CNN object detection and classical method

)e ResNet proposed by the Amazon Artificial Intelli-gence Laboratory serves as a basic network for constructingFCNs for semantic segmentation the IoU in VOC2012reaches 86 [23] )e prediction results of the FCN ap-plication are obtained by eight-fold bilinear interpolation ofthe feature map including the problems of detail losssmoothing of complex boundaries and poor detectionsensitivity of small objects)e results ignore the global scaleof the image possibly exhibiting regional discontinuity forlarge objects that exceed the receptive field Incorporatingfull connection and upsampling increase the size of thenetwork and introduces a large number of parameters to belearned

Figure 4 shows a schematic of the SegNet model which isan efficient real-time end-to-end semantic segmentationnetwork proposed by Alex Kendall et al (Cambridge Uni-versity) in 2015 )e idea is that the encoder and the decoderhave a one-to-one correspondence and the network appliesthe pooled index in the encoderrsquos maximum pooling toperform nonlinear upsampling thus forming a sparse featuremap then it performs convolution to generate a dense featuremap SegNet defines the basic network of the encoder-de-coder and deletes the fully connected layer to generate globalsemantic information )e decoder utilizes the encoder in-formation without training while the required amount oftraining parameters is 217 of that of the FCN For theprediction of the results SegNet and FCN occupy a GPUmemory of 1052 and 1806MB respectively and the GPUmemory occupancy on GPU GTX 980 (video memory4096MB) is 2568 and 4409 respectively )erefore theoccupancy of SegNet is 1841 lower than that of FCN In[20] the design of SegNet on ResNet was described and theIoU in VOC2012 reached 804 [24] )e IoU of SegNettested in VOC2012 was reported to be 599 and the

Object response maps

Componentsegmentation

Objectsegmentation

Valid component segmentation

Relation between component and object

Object regionand label

Component response maps

Component regionand label

Figure 1 Flowchart of semantic segmentation under the complex background using encoder-decoder network

Start

The encoder-decoder network transforms images to feature maps and the component classifier recognizes the

part response map and generates part segment results

The object classifier recognizes the object response map and generates object segmentation results with the

pixels of the foreground object extracted

End

The relationship between the component and objectis analyzed to improve the accuracy of part segmentation

Figure 2 Flowchart of the semantic segmentation under a complexbackground using the encoder-decoder network

Mathematical Problems in Engineering 3

efficiency was found to be 23 lower than that of FCNfurthermore there was the problem of false detection at theboundary

Figure 5 shows a schematic of the U-Net model whichwas proposed by Olaf Ronneberger (University of FreiburgGermany) in 2015 )e idea was to design a basic networkthat can be trained by semantic segmentation images andmodify the FCN cross-layer overlay architecture with thehigh-resolution feature map channels retained in theupsampling section and then connect it to the decoderoutput feature map in the third dimension Furthermore atiling strategy without limited by GPU memory was pro-posed with this strategy a seamless semantic segmentationof arbitrary high-resolution images was achieved WithU-Net a IoU of 920 and 776 was achieved in the

grayscale image semantic segmentation datasets PhC-U373and DIC-HeLa respectively )e skip connection was usedin the ResNet framework to improve U-Net and a IoU of827 was achieved in the VOC2012 [25] )ere are two keyproblems with the application of U-Net the basic networkneeds to be trained and it can only be applied to specifictask ie it has poor universality

Figure 6 shows a schematic of the UPerNet model whichwas proposed by Tete Xiao (Peking University China) in2018 In the UPerNet framework a feature pyramid network(FPN) with a pyramid pooling module (PPM) is appendedon the last layer of the backbone network before feeding itinto the top-down branch of the FPN Object and part headsare attached on the feature map and are fused by all thelayers put out by the FPN

Decoderskip connect

Encoder (Backbone)

Pooling layersConvolutional layers

2times upsample

2times upsample

8times upsample sum sum

Figure 3 Schematic of FCN model

Decoder

Encoder (Backbone)

IndexIndexIndexIndexIndex

Pooling layersConvolutional layersDepool layers

Figure 4 Schematic of SegNet model


3 Material and Methods

)e semantic segmentation under a complex backgroundbased on the encoder-decoder network will establish anoptimized mathematical model with minimal segmentationtime Tsegminusmin segmentation time Tseg and accuracy PAUnder the encoder-decoder network the backbone network

ηmain the depth dmain and the decoder ηdecoder are obtainedto form an encoder By selecting the relatively better ηmainand ηdecoder of the basic network the component analysismodule to improve the optimized architecture is proposedand the encoder-decoder network with optimized PA forsemantic segmentation under a complex background isobtained In the encoder-decoder network the encoder

Decoder

Encoder (Backbone)

ConcatConcatConcatConcat

Pooling layersConvolutional layersDeconv layers

Figure 5 Schematic of U-Net model

PPM head

Fuse

Head

Head

Head

Fused feature map

Head

Material

Object

Part

Scene

No grad

Head

Texture(Testing)

orImage

(~450 times 720)

Image for texture(~48 times 48)

= Marbled

Feature Pyramid Network

Conv 3 times 3 GlobalAvgPooling Classifier

Scene head

Conv 3 times 3 Classifier

ObjectPartMaterial head

Conv 3 times 3 Conv 3 times 3 Classifier

Texture head

4x Conv (128 channels)

(Training)

Living room

14

18

116

132

14

14

18

116

132

Figure 6 Flowchart of UPerNet model


transforms color images (three 2D arrays) to 2048 2D arrays)e encoder is composed of convolutional layers andpooling layers and it could be trained on large-scale clas-sification datasets such as ImageNet to gain greater featureextraction capability

Modeling of semantic segmentation under a complexbackground using the encoder-decoder network and se-lection of ηmain and ηdecoder

)e encoder network is determined by the backbonenetwork ηmain depth dmain and decoder ηdecoder Segmen-tation time Tseg and accuracy PA depend on ηdecoder ηmainand dmain which can be expressed asPA(ηdecoder ηmain dmain) and Tseg(ηdecoder ηmain dmain)Denoting the minimal segmentation time as Tsegminusmin (therecommended value is 600ms) the mathematical model ofthe optimization for semantic segmentation under a com-plex background based on the encoder-decoder network isas follows

max PA ηdecoder ηmain dmain( 1113857

st Tseg ηdecoder ηmain dmain( 1113857leTsegminusmin

⎧⎨

⎩ (3)

)e parameters of the model to be optimized are dmainηmain and ηdecoder

First dmain ηmain and ηdecoder are combined )en theobject segmentation accuracy PAobj component segmen-tation accuracy PAcomp and Tseg are compared to select therelatively better ηmain and ηdecoder for the basic network

)e ADE20K dataset which has diverse annotations ofscenes objects parts of objects and parts of parts [26] isselected In this paper we denote parts of objects as com-ponent Using a GeForce GTX 1080Ti GPU and the trainingmethod described in [27] we obtained PAobj and PAcomp forimproved FCN [19] PSPNet [28] UPerNet [29] and othermajor encoder-decoder networks for semantic segmentationused in the ADE20K [26] objectcomponent segmentationdataset We evaluated Tseg of different network on theADE20K test set which consist of 3000 different resolutionimages with average image size of 13 million pixels Table 1displays the pixel accuracy and segmentation time of themain network architectures on ADE20K objectcomponentsegmentation tasks where the relatively better indices areindicated by a rectangular contour

From Table 1 the following observation can be made①In all networks PAcomp is less than PAobj by about 30 ②ηmain and dmain are equal in networks 1 2 and 3 PAcomp andPAobj are better in ηdecoder FPN + PPM compared toηdecoder FCN or ηdecoder PPM ③ηmain and ηdecoder areequal in networks 3 and 4 When dmain is doubled PAcompimproves slightly and Tseg improves significantly After acomprehensive consideration we selected the UPerNet [23]encoder-decoder network where ηmain ResNetdmain 50 and ηdecoder PPM + FPN

Figure 7 shows the architecture of semantic segmen-tation under a complex background implemented byUPerNet [29] )e encoder ResNet reduces the featuremap resolution by 12 at each stage )e resolution of the

output feature maps within five stages is respectively re-duced to 12 14 18 116 and 132 )e decoder isPPM+ FPN )rough pooling layers with different stridesthe feature maps are analyzed in a multiscale mannerwithin PPM )rough three transposed convolutionlayers the resolution of the feature maps is increased twotimes to 116 18 and 14 )e upsampling restores thefeature map resolution to 11 )e component analysismodule recognizes the feature map and outputs both theobjectcomponent segmentation results

Figure 8 shows the component analysis module ofUPerNet )e module is composed of the object classifiercomponent classifier and component analysis module)e input of each classifier is a 1 1 feature map )e objectclassifier implements the semantic recognition of NObjkinds of objects and outputs the object probability vectorpuvObj and the object label Cuv

Obj )e component classifierimplements the semantic recognition of NComp kinds ofcomponents and outputs the component probabilityvector puv

Comp and the component label CuvComp According

to CuvObj and the component object set CObjminusThings the

component analysis module only segments the CuvComp that

satisfies CuvObj isin CObjminusThings and outputs the valid component

label 1113954Cuv

Comp UPerNet outputs the object segmentation result(the object labelCuv

Obj) and the component segmentation result(the valid component label 1113954C

uv

Comp))e component analysis module of UPerNet can be

expressed as follows

1113954Cuv

Comp fOp CuvObj C

uvCompCObjminusThings1113872 1113873

1 times C

uvComp C

uvObj isin CObjminusThings

0 times CuvComp C

uvObj notin CObjminusThings

⎧⎨

⎩

(4)

A greater PAcomp of 1113954Cuv

Comp leads to a higher componentsegmentation efficiency

Equation (4) outputs 1113954Cuv

Comp that satisfiesCuvObj isin CObjminusThings By identifying deviations of Cuv

Obj due tothe relationship between Cuv

Comp and CuvObj the optimized

component analysis module can improve the efficiency ofcomponent segmentation it both meets the requirement ofTsegminusmin and improves PAcomp

Improvements of UPerNet for semantic segmentationunder a complex background based on the componentanalysis module

In this subsection we describe the derivation of thecomponent analysis module the optimization of the func-tion expression of the module and the construction of thearchitecture of the component analysis module

As shown in Figure 8 the component classifier recog-nizes Ncomp component semantics and outputs the com-ponent labels Cuv

Comp of the pixel with image position (u v)

and the probability vector puvComp corresponding to the

various component labels )e relationship between CuvComp

and puvComp [31] is as follows

CuvComp argmaxkpCompminusk k 1 2 NComp (5)

From equation (4) and (5) we obtain


1113954Cuv

Comp fOp CuvObj C


1 times argmaxkpCompminusk C


0 times argmaxkpCompminusk CuvObj notin CObjminusThings

⎧⎨

⎩ k 1 2 NComp(6)

where pObjminusj is the probability of CuvComp Weighting pObjminusj

over pCompminusk to get 1113954pCompminusk instead of 1 times argmaxkpCompminuskreducing the weight of low-probability object labels andincreasing PAcomp With Cuv

Obj isin CObjminusThings if

CuvComp notin C

CObjCompminusObj letting 1113954pCompminusk 0 can increase the

detection rate of background pixels )erefore the modulecan be expressed as follows

DecoderEncoder ηmain = ResNet

CS3 CS4 CS5CS1 CS2

Poolingpyramid module

PPM

Cup-1 Cup-2 Cup-3 Fusion

Object classifier

Pixel accuracyPA

Ups

ampl

e

Component segmentation

Objectsegmentation

Component classifier

Feature map pramid network FPN

Component analysis module

Componentanalysis model

Inputimage

ηdecoder = PPM + FPN

Segmentation time Tseg (ηmain dmain ηdecoder)

Depth dmain = 50

Figure 7 Flowchart of semantic segmentation under a complex background implemented by UPerNet

Componentclassifier

Validcomponent

label

Object classifier

Componentanalysis model1 1

Featuremap

Object label Cuv

Comp label Cuv

Valid object setℂObj-Things

Object segmentation


CompfO (Cuv

CuvObj

Obj

Comp)CompC uv

Figure 8 Flowchart of component analysis module of UPerNet

Table 1 Pixel accuracy and segmentation time of the main network architectures on ADE20K objectcomponent segmentation task )erectangular contour indicates the best indices

Network Backboneηmain

Backbonedepth dmain

Decoderηdecoder

Object segmentationaccuracy () PAobj

Component segmentationaccuracy () PAcomp

Segmentation time(ms) Tseg

1 FCN [30] ResNet 50 FCN 7132 4081 333

2 PSPNet[28] ResNet 50 PPM 8004 4723 483

3 UPerNet[29] ResNet 50 PPM+FPN 8023 4830 496



1113954Cuv

Comp fOp CuvObj C

uvCompCObjminusThingsC

CObj

CompminusObj1113874 1113875 argmaxk1113954pCompminusk k 1 2 NComp

1113954pCompminusk

1113936

jisinCObjminusThingsandkisinCj

CompminusObj

j1pObjminusj

⎛⎜⎜⎝ ⎞⎟⎟⎠ kltNComp

1 minus 1113936

jnotinCObjminusThingsork notin Cj

CompminusObj

j1pObjminusj

⎛⎜⎜⎝ ⎞⎟⎟⎠pCompminusk k NComp

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)

which is the component analysis module yielded by replacing1 times argmaxkpCompminusk with argmaxk

1113954pCompminusk and considering

CuvComp notin C

CObjCompminusObj

)e optimized architecture of the UPerNet componentanalysis module is proposed based on equation (7)Figures 9(a)ndash9(c) show the optimized architecture obtainedby replacing 1 times argmaxkpCompminusk with argmaxk

1113954pCompminusk by

considering CuvComp notin C

CObjCompminusObj and by both replacing 1 times

argmaxkpCompminusk with argmaxk1113954pCompminusk and considering

CuvComp notin C

CObjCompminusObj in the component analysis module

respectively

31 Experimental Results

311 ADE20K Component Segmentation Task For theUPerNet model the backbone network of the encoder wasResNet dmain 50 and the decoders arePPM+FPN+component analysis modules (beforeaftermodification) We trained each network on the objectcom-ponent segmentation task dataset ADE20K [26] to demonstratethe pixel accuracy PA1113955Part

and segmentation time Tseg )eexperiments were run on a GeForce GTX 1080Ti GPU

Table 2 reports PA1113955Partand Tseg of the UPerNet obtained

with different component analysis modules in ADE20Kcomponent segmentation task From the results the fol-lowing observations can be made

(i) )e pixel accuracy of ResNet(dmain 50)+PPM+FPN+ the proposed modifiedcomponent analysis modules with different settingsincreased from 4830 (without component analysismodules) to 5403 5513 and 5562 while thesegmentation time lengthened marginally from 483to 492 486 and 496ms respectively

)e UPerNet with modified component analysis mod-ules showed significantly high segmentation performanceBoth PA1113955Part

and Tseg outperformed the UPerNet with adeeper dmain PA1113955Part

and Tseg of the architecture (dmain 50)are 5562 and 496ms while those of the architectures withno modification with dmain 101 and 152 were 4871 and598ms and 4889 and 721ms respectively as shown inFigure 9(c)

312 CITYSCAPES Instance-Level Semantic Labeling TaskWe trained each UPerNet (withwithout component anal-ysis module) on the instance-level semantic labeling task of

the CITYSCAPES dataset [32] To assess the instance-levelperformance CITYSCAPES uses themean average precisionAP and average precision AP05 [32] We also report thesegmentation time of each network run on a GeForce GTX1080Ti GPU and an Intel i7-5960X CPU Table 3 presents theperformances of different methods on a CITYSCAPES in-stance-level semantic labeling task Table 4 presents the meanaverage precision AP on class-level of the UPerNet withwithout the component analysis module in the CITYSCAPESinstance-level semantic labeling task From the table it can beseen that the modified component analysis modules effec-tively improved the performance of the UPerNet With thecomponent analysis module both AP and AP05 are im-proved and the segmentation time Tseg increased slightlyfrom 447 to 451msMost of the UPerNet AP on class-level areimproved Figure 10 shows someCITYSCAPES instance-levelsemantic labeling results obtained with the UPerNet withwithout component analysis module

Taking banknote detection as an example we set up thesemantic segmentation model by the component analysismodules (beforeafter modification) to vision-based detec-tion of 2019 Chinese Yuan (CNY) feature in the backlight todemonstrate the segmentation performance of the proposedmethod

)e vision-based detection system consisted of an MV-CA013-10GC industrial camera an MVL-HF2528M-6MPlens and a LED strip light )e field of view was 1833deg andthe resolution was 1280times1024 Under the backlight wecollected 25 CNY images of various denomination frontsand backs at random angles )en we marked four types oflight-transmitting anticounterfeiting features namely se-curity lines pattern watermarks denomination watermarksand Yin-Yang denominations All four features were de-tected in the CNY images to generate our dataset (200images) We trained the model with different componentanalysis modules from our dataset to demonstrate PA1113955Partand Tseg Table 3 presents the pixel accuracy and segmen-tation time of UPerNet with different component analysismodules for CNY anticounterfeit features via vision-baseddetection and Figure 11 shows the segmentation results ofthe anticounterfeiting features detected by UPerNet withwithout the component analysis module

From Table 5 it can be seen that the proposed methodimproved PA1113955Part

from 9038 to 9529 Tseg from 490 to496ms Moreover APIoUT05 increased from 961 to 100detecting all the light transmission anti-counterfeiting fea-tures without false detection missing detection or repeateddetection


Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation


Relation between comp and obj

ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier


Comp response Validcomplabel

Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv

Obj Comp) p uv Comp C uv

Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)

Figure 9 Optimized architecture with the component analysis module (a) Replace 1 times argmaxkpCompminusk with argmaxk1113954pCompminusk to optimize

the module (b) Analyze CuvComp notin C

CObjCompminusObj to optimize the module (c) Replace 1 times argmaxkpCompminusk with argmaxk

1113954pCompminusk and analyzeCuvComp notin C

CObjCompminusObj to optimize the module

Table 2 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAMs) on ADE20K componentsegmentation task

Backboneηmain

Backbonedepth dmain

Decoder ηdecoder Comp Analysis model Comp Segmentationaccuracy PA1113955Part

()Segmentation time

Tseg (ms)

1 ResNet 50 PPM+FPN mdash 4830 4832 ResNet 101 PPM+FPN mdash 4871 5983 ResNet 152 PPM+FPN mdash 4889 7214 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5362 4905 ResNet 101 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5396 6046 ResNet 152 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 5418 726


UPerNet UPerNet with the component analysis module

CarCarCar Car CarCar Car

CarCarCarCar CarCar Car

Car

Car



Car

Car

CarCar Car Car CarCar

Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person

Figure 10 CITYSCAPES instance-level semantic labeling by UPerNet

Table 4 Mean average precision AP on class-level of the UPerNet withwithout CAM in CITYSCAPES instance-level semantic labelingtask

Method Person () Rider () Car () Truck () Bus () Train () Motorcycle () Bicycle ()UperNet 360 288 516 300 387 273 239 194UperNet +CAM 360 288 530 343 570 375 223 238

Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)

7 ResNet 50 PPM+FPN+CAM argmaxk1113954pCompminusk 5403 492

8 ResNet 50 PPM+FPN+CAM CuvComp notin C

CObj

CompminusObj 5513 4869 ResNet 50 PPM+FPN+CAM argmaxk

1113954pCompminusk + CuvComp notin C

CObjCompminusObj 5562 496

Table 3 Performances of different methods on CITYSCAPES instance-level semantic labeling task

Method AP () AP050 () Segmentation time (ms)

SegNet 295 556 mdashMask R-CNN 320 581 mdashUperNet 320 573 447UperNet +CAM 365 622 451CAM Component Analysis Module


4 Conclusions

In this study we performed semantic segmentation under acomplex background using the encoder-decoder network tosolve the issue of the mutually exclusive relationship be-tween the semantic response value and the semantics ofobjectcomponent in the semantic segmentation under acomplex background for online machine vision detection)e following conclusions can be drawn from this study

(i) Considering the mutually exclusive relationshipbetween the semantic response value and the se-mantics of objectcomponent we selected themathematical model of semantic segmentationunder a complex background based on the encoder-decoder network for optimization It was found that

ηmain ResNet dmain 50 is the best encoder andηdecoder PPM + FPN is the best selected decoder

(ii) We replaced 1 times argmaxkpCompminusk withargmaxk

1113954pCompminusk )e component analysis module

of CuvComp notin C

CObjCompminusObj and UPerNet are considered

to improve the performance of the encoder-decodernetwork

(iii) )e experimental results show that the componentanalysis module improves the performance of se-mantic segmentation under a complex backgroundBoth PA1113955Part

and Tseg of the proposed model werebetter than those of the UPerNet with deeper dmainSpecifically the accuracy improved from 4889 to5562 and Tseg from 721 to 496ms By performing

DenominationPattern

Safeline SafelineChinese

ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark

PatternVersionDenomination

Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark

Serial numberWatermark Watermark

SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline


Figure 11 Anticounterfeiting features detected by the UPerNet withwithout the component analysis module

Table 5 Pixel accuracy and segmentation time of UPerNet with different component analysis modules (CAM) for CNY anticounterfeitfeatures via vision-based detection

Backbone ηmain Depth dmain Decoder ηdecoder Component analysis module PA1113955Part() APIoUT05() Tseg (ms)

1 ResNet 50 PPM+FPN mdash 8850 853 4832 ResNet 50 PPM+FPN+CAM 1 times argmaxkpCompminusk [29] 9038 961 4903 ResNet 50 PPM+FPN+CAM argmaxk


CObjCompminusObj

9529 100 496


vision-based detection with the 2019 CNY featureswe showed that the proposed method improvedPA1113955Part

from 9038 to 9529 while Tseg increasedonly slightly from 490 to 496ms APIoUT01 alsoincreased from 961 to 100 detecting all the lighttransmission anticounterfeiting features withoutfalse detection missing detection or repeateddetection

)e model in which 1 times argmaxkpPartminusk was replacedwith argmaxk

1113954pPartminusk and the corresponding componentanalysis module improved the performance of the UPerNetencoder-decoder network However the efficiency im-provement is affected by the accuracy of object segmenta-tion In our next study we will investigate the applicability ofmachine learning to the component analysis module toachieve a higher performance in different applications

Data Availability

)e ADE20K Dataset used to support the findings of thisstudy is available at httpgroupscsailmiteduvisiondatasets )e CITYSCAPES Dataset used to support thefindings of this study is available at httpswwwcityscapes-datasetcom Its pretrained models and code are released athttpsgithubcomCSAILVisionsemantic-segmentation325pytorch

Conflicts of Interest

)e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

)is research was funded by the Key-Area Research andDevelopment Program of Guangdong Province (Grant no2019B010154003) and the Guangzhou Science and Tech-nology Plan Project (Grant no 201802030006)

References

[1] C Szegedy W Liu and Y Jia ldquoGoing deeper with convo-lutionsrdquo in Proceedings of the Computer Vision and PatternRecognition IEEE Boston MA USA pp 1ndash9 June 2015

[2] K He X Zhang and S Ren ldquoDeep residual learning forimage recognitionrdquo in Proceedings of the Computer vision andpattern recognition IEEE Las Vegas NV USA pp 770ndash778June 2016

[3] K He G Gkioxari P Dollar and R Girshick ldquoMaskR-CNNrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 42 no 2 pp 386ndash397 2020

[4] G Liu B He and S Liu ldquoChassis assembly detection andidentification based on deep learning component instancesegmentationrdquo Symmetry vol 11 no 8 2019

[5] R Manish A Venkatesh and S Denis Ashok ldquoMachinevision based image processing techniques for surface finishand defect inspection in a grinding processrdquoMaterials TodayProceedings vol 5 no 5 pp 12792ndash12802 2018

[6] L Geng Y Wen and F Zhang ldquoMachine vision detectionmethod for surface defects of automobile stamping partsrdquo

American Scientific Research Journal for Engineering Tech-nology and Sciences vol 53 no 1 pp 128ndash144 2019

[7] M M Islam and J Kim ldquoVision-based autonomous crackdetection of concrete structures using a fully convolutionalencoderndashdecoder networkrdquo Sensors vol 19 no 19 2019

[8] S Zhou D Nie E Adeli J Yin J Lian and D Shen ldquoHigh-resolution encoder-decoder networks for low-contrast medicalimage segmentationrdquo IEEE Transactions on Image Processingvol 29 pp 461ndash475 2020

[9] S Liu J Huang and G Liu ldquoTechnology of multi-categorylegal currency identification under multi-light conditionsbased on AleNetrdquo China Measurement ampTest vol 45 no 9pp 118ndash122 2019 in Chinese

[10] H Kang and C Chen ldquoFruit detection and segmentation forapple harvesting using visual sensor in orchardsrdquo Sensorsvol 19 no 20 p 4599 2019

[11] E Pardo J M T Morgado and N Malpica ldquoSemanticsegmentation of mFISH images using convolutional net-worksrdquo Cytometry Part A vol 93 no 6 pp 620ndash627 2018

[12] G Liu S Liu and J Wu ldquoMachine vision object detectionalgorithm based on deep learning and application in banknotedetectionrdquo China Measurement ampTest vol 45 no 5 pp 1ndash92019 in Chinese

[13] H Gao H Yuan and Z Wang ldquoPixel transposed convolu-tional networksrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 42 no 5 pp 1218ndash1227 2019

[14] Q D Vu and J T Kwak ldquoA densemulti-path decoder for tissuesegmentation in histopathology imagesrdquo Computer Methodsand Programs in Biomedicine vol 173 pp 119ndash129 2019

[15] J Huang and G Liu ldquo)e development of CNN-based se-mantic segmentation methodrdquo Laser Journal vol 40 no 5pp 10ndash16 2019 in Chinese

[16] S Nowozin ldquoOptimal decisions from probabilistic modelsthe intersection-over-union caserdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE ColumbusOH USA pp 548ndash555 June 2014

[17] D Hoiem Y Chodpathumwan and Q Dai ldquoDiagnosingerror in object detectorsrdquo in Proceedings of the EuropeanConference on Computer Vision pp 340ndash353 IEEE FlorenceItaly October 2012

[18] K He and J Sun ldquoConvolutional neural networks at con-strained time costrdquo in Proceedings of the Computer Vision andPattern Recognition IEEE Boston MA USA pp 5353ndash5360June 2015

[19] J Long E Shelhamer and T Darrell ldquoFully convolutionalnetworks for semantic segmentationrdquo in Proceedings of theComputer Vision and Pattern Recognition IEEE Boston MAUSA pp 3431ndash3440 June 2015

[20] V Badrinarayanan A Kendall and R Cipolla ldquoSegNet adeep convolutional encoder-decoder architecture for imagesegmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 39 no 12 pp 2481ndash2495 2017

[21] O Ronneberger P Fischer and T Brox ldquoU-net convolu-tional networks for biomedical image segmentationrdquo inProceedings of the Medical Image Computing and Computer-Assisted Intervention pp 234ndash241 IEEE Munich GermanyOctober 2015

[22] B Hariharan P Arbelaez R Girshick and J Malik ldquoSi-multaneous detection and segmentationrdquo in Proceedings ofthe European Conference on Computer Vision pp 297ndash312IEEE Zurich Switzerland March 2014

[23] N D Lane and P Warden ldquo)e deep (learning) transfor-mation of mobile and embedded computingrdquo Computervol 51 no 5 pp 12ndash16 2018


[24] C Qing J Ruan X Xu J Ren and J Zabalza ldquoSpatial-spectral classification of hyperspectral images a deep learningframework with Markov Random fields based modellingrdquo IetImage Processing vol 13 no 2 pp 235ndash245 2019

[25] X Li Z Liu and P Luo ldquoNot all pixels are equal difficulty-aware semantic segmentation via deep layer cascaderdquo inProceedings of the Computer Vision and Pattern RecognitionIEEE Honolulu HI USA pp 6459ndash6468 July 2017

[26] B Zhou H Zhao X Puig et al ldquoSemantic understanding ofscenes through the ADE20K datasetrdquo International Journal ofComputer Vision vol 127 no 3 pp 302ndash321 2019

[27] L-C Chen ldquoRethinking atrous convolution for semanticimage segmentationrdquo 2017 httpsarxivorgabs170605587

[28] H Zhao J Shi and X Qi ldquoPyramid scene parsing networkrdquoin Proceedings of the Computer Vision and Pattern Recogni-tion IEEE Honolulu HI USA pp 6230ndash6239 July 2017

[29] T Xiao Y Liu B Zhou Y Jiang and J Sun ldquoUnifiedperceptual parsing for scene understandingrdquo in Proceedings ofthe European Conference on Computer Vision pp 432ndash448IEEE Munich Germany September 2018

[30] D Kim J Kwon and J Kim ldquoLow-complexity online modelselection with lyapunov control for reward maximization instabilized real-time deep learning platformsrdquo in Proceedings ofthe Systems Man and Cybernetics pp 4363ndash4368 MiyazakiJapan January 2018

[31] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquoCommunications of the Acm vol 60 no 6 pp 84ndash90 2017

[32] M Cordts O Mohamed and S Ramos ldquo)e cityscapesdataset for semantic urban scene understandingrdquo in Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR) IEEE Las Vegas NV USApp 3213ndash3223 June 2016


contrary the encoder-decoder semantic segmentationnetwork ultimately retains the classification components inthe network backbone thus exhibiting larger receptivefields and better pixel recognition ability [12 13] asdepicted in Figure 1 Unreasonably selected and conse-quently incorrectly used component analysis modules willlead to an excessively small foreground range resulting inthe misjudgment of component pixels If the componentanalysis module is too sensitive the foreground range willbe too broad thus it would be difficult to remove mis-judged pixels [14] )erefore in the process of semanticsegmentation under the complex background it is nec-essary to consider objects the contradiction between thecomponent semantic response values and their mutualexclusion relationship while maximizing the accuracy ofthe semantic segmentation under the complex backgroundusing the encoder-decoder network

Figure 2 shows a flowchart of the semantic segmen-tation under the complex background using the encoder-decoder network )e process can be described as followsthe component classifier of the encoder-decoder networkrecognizes the pixel-level semantics and response of thepixels in the image the object classifier recognizes thepixel-level object semantics and the response and extractsmisjudged pixels of the foreground object in semanticsegmentation finally the mutually exclusive relationshipbetween component semantics and object semantics isconsidered and non-background-independent semanticsare determined to achieve effective semantic segmentationunder a complex background to improve the model ac-curacy [15]

In this study we focus on online semantic segmentationunder a complex background using the encoder-decodernetwork to solve the above described mutual exclusionrelationship problem between component semantics andobject semantics )e main contributions of this study arethreefold

(i) We attempted to improve the low accuracy ofcomponent segmentation and selected the superiorbasic encoder-decoder network according to theperformance

(ii) We modified the UPerNet based on the componentanalysis module to maximize the accuracy of thesemantic segmentation under a complex back-ground using the encoder-decoder network whilemaintaining an appropriate segmentation time

(iii) We show that the proposed method is superior toprevious encoder-decoder network and has satis-factory accuracy and segmentation time We alsoshow the application of the proposedmethod in bill-note anticounterfeiting identification

)e rest of this paper is organized as follows In Section1 we outline related works In Section 2 we introduce amethod for semantic segmentation under a complexbackground using the encoder-decoder network In Section3 we verify the proposed method In Section 4 we presentthe conclusions

2 Related Work

21 Evaluation of the Semantic Segmentation PerformanceWe can generally evaluate the CNN semantic segmentationperformance from the accuracy and running speed )eaccuracy indicators usually include the pixel accuracy [16]mean intersection over union [16] and mean averageprecision [17] )e pixel accuracy PA is defined as thenumber of pixels segmented correctly accounting for thetotal number of image pixels the mean intersection overunion IoU is defined as the degree of coincidence betweenthe segmentation results and their ground-truth the meanaverage precision APIoUT is the mean of average precisionscores for segmentation results whose intersection overunion no less than IoUT for each classes

If the object detected by machine vision has k categoriesthe semantic segmentation model requires the label of thek + 1 categories denoted as L l0 l1 lk1113864 1113865 including thebackground Denoting the number of pixels of li mis-rec-ognized as the pixel of lj and li(ine j) as pij and pii re-spectively the numbers of detected objects of li mis-recognized as lj and li(ine j) as Nij and Nii respectively thepixel accuracy can be calculated as follows

PA 1113944

k

i0pii

1113944k

i01113944k

j0pij

IoU 1113944K

i0pii

1113944k

j0pij + 1113944k

j0pji minus pii

(1)

APIoUT 1113944

k

i0Nii

1113944k

i01113944k

j0Nij

(2)

)e running speed of CNN semantic segmentation canbe measured by indicators including the segmentation timeTseg [18] which is defined as the time needed to segment theimage by running the algorithm )e theoretically shortestpossible time required to segment the image is also labeled asthe theoretical segmentation time Tsegminust and the time re-quired for the algorithm to actually segment the image isknown as the actual segmentation time Tsegminusa If not oth-erwise specified Tsegminusa is denoted as Tseg

22 End-To-End Encoder-Decoder Semantic SegmentationFramework Although CNN semantic segmentation per-forms as a single-step end-to-end process which is notfurther divided into multiple modules to deal with theconnection of numerous modules directly affects the CNN)e end-to-end semantic segmentation framework using theencoder-decoder enables the CNN to detect images with anyresolution and output prediction map results with constantresolution Typical networks include fully convolutionalnetworks (FCN) [19] SegNet [20] and U-Net [21]

Figure 3 shows a schematic of the FCN model )e FCNis an end-to-end semantic segmentation framework








Objectsegmentation







Start





End








Decoderskip connect

Encoder (Backbone)


2times upsample

2times upsample



Decoder

Encoder (Backbone)








Decoder

Encoder (Backbone)




PPM head

Fuse

Head

Head

Head

Fused feature map

Head

Material

Object

Part

Scene

No grad

Head

Texture(Testing)

orImage

(~450 times 720)


= Marbled



Scene head




Texture head


(Training)

Living room

14

18

116

132

14

14

18

116

132








⎧⎨

⎩ (3)













label 1113954Cuv



uv



1113954Cuv

Comp fOp CuvObj C


1 times C

uvComp C


0 times CuvComp C


⎧⎨

⎩

(4)


















1113954Cuv

Comp fOp CuvObj C





⎧⎨

⎩ k 1 2 NComp(6)




CuvComp notin C




CS3 CS4 CS5CS1 CS2


PPM


Object classifier

Pixel accuracyPA

Ups

ampl

e


Objectsegmentation





Inputimage



Depth dmain = 50


Componentclassifier

Validcomponent

label

Object classifier


Featuremap

Object label Cuv

Comp label Cuv


Object segmentation


CompfO (Cuv

CuvObj

Obj

Comp)CompC uv




Backbonedepth dmain

Decoderηdecoder




1 FCN [30] ResNet 50 FCN 7132 4081 333





1113954Cuv

Comp fOp CuvObj C


CObj


1113954pCompminusk

1113936


CompminusObj

j1pObjminusj


1 minus 1113936


CompminusObj

j1pObjminusj


⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)



CuvComp notin C

CObjCompminusObj






CuvComp notin C


respectively

















Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References










































Objectsegmentation







Start





End








Decoderskip connect

Encoder (Backbone)


2times upsample

2times upsample



Decoder

Encoder (Backbone)








Decoder

Encoder (Backbone)




PPM head

Fuse

Head

Head

Head

Fused feature map

Head

Material

Object

Part

Scene

No grad

Head

Texture(Testing)

orImage

(~450 times 720)


= Marbled



Scene head




Texture head


(Training)

Living room

14

18

116

132

14

14

18

116

132








⎧⎨

⎩ (3)













label 1113954Cuv



uv



1113954Cuv

Comp fOp CuvObj C


1 times C

uvComp C


0 times CuvComp C


⎧⎨

⎩

(4)


















1113954Cuv

Comp fOp CuvObj C





⎧⎨

⎩ k 1 2 NComp(6)




CuvComp notin C




CS3 CS4 CS5CS1 CS2


PPM


Object classifier

Pixel accuracyPA

Ups

ampl

e


Objectsegmentation





Inputimage



Depth dmain = 50


Componentclassifier

Validcomponent

label

Object classifier


Featuremap

Object label Cuv

Comp label Cuv


Object segmentation


CompfO (Cuv

CuvObj

Obj

Comp)CompC uv




Backbonedepth dmain

Decoderηdecoder




1 FCN [30] ResNet 50 FCN 7132 4081 333





1113954Cuv

Comp fOp CuvObj C


CObj


1113954pCompminusk

1113936


CompminusObj

j1pObjminusj


1 minus 1113936


CompminusObj

j1pObjminusj


⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)



CuvComp notin C

CObjCompminusObj






CuvComp notin C


respectively

















Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References








































Decoderskip connect

Encoder (Backbone)


2times upsample

2times upsample



Decoder

Encoder (Backbone)








Decoder

Encoder (Backbone)




PPM head

Fuse

Head

Head

Head

Fused feature map

Head

Material

Object

Part

Scene

No grad

Head

Texture(Testing)

orImage

(~450 times 720)


= Marbled



Scene head




Texture head


(Training)

Living room

14

18

116

132

14

14

18

116

132








⎧⎨

⎩ (3)













label 1113954Cuv



uv



1113954Cuv

Comp fOp CuvObj C


1 times C

uvComp C


0 times CuvComp C


⎧⎨

⎩

(4)


















1113954Cuv

Comp fOp CuvObj C





⎧⎨

⎩ k 1 2 NComp(6)




CuvComp notin C




CS3 CS4 CS5CS1 CS2


PPM


Object classifier

Pixel accuracyPA

Ups

ampl

e


Objectsegmentation





Inputimage



Depth dmain = 50


Componentclassifier

Validcomponent

label

Object classifier


Featuremap

Object label Cuv

Comp label Cuv


Object segmentation


CompfO (Cuv

CuvObj

Obj

Comp)CompC uv




Backbonedepth dmain

Decoderηdecoder




1 FCN [30] ResNet 50 FCN 7132 4081 333





1113954Cuv

Comp fOp CuvObj C


CObj


1113954pCompminusk

1113936


CompminusObj

j1pObjminusj


1 minus 1113936


CompminusObj

j1pObjminusj


⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)



CuvComp notin C

CObjCompminusObj






CuvComp notin C


respectively

















Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References







































Decoder

Encoder (Backbone)




PPM head

Fuse

Head

Head

Head

Fused feature map

Head

Material

Object

Part

Scene

No grad

Head

Texture(Testing)

orImage

(~450 times 720)


= Marbled



Scene head




Texture head


(Training)

Living room

14

18

116

132

14

14

18

116

132








⎧⎨

⎩ (3)













label 1113954Cuv



uv



1113954Cuv

Comp fOp CuvObj C


1 times C

uvComp C


0 times CuvComp C


⎧⎨

⎩

(4)


















1113954Cuv

Comp fOp CuvObj C





⎧⎨

⎩ k 1 2 NComp(6)




CuvComp notin C




CS3 CS4 CS5CS1 CS2


PPM


Object classifier

Pixel accuracyPA

Ups

ampl

e


Objectsegmentation





Inputimage



Depth dmain = 50


Componentclassifier

Validcomponent

label

Object classifier


Featuremap

Object label Cuv

Comp label Cuv


Object segmentation


CompfO (Cuv

CuvObj

Obj

Comp)CompC uv




Backbonedepth dmain

Decoderηdecoder




1 FCN [30] ResNet 50 FCN 7132 4081 333





1113954Cuv

Comp fOp CuvObj C


CObj


1113954pCompminusk

1113936


CompminusObj

j1pObjminusj


1 minus 1113936


CompminusObj

j1pObjminusj


⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)



CuvComp notin C

CObjCompminusObj






CuvComp notin C


respectively

















Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References









































⎧⎨

⎩ (3)













label 1113954Cuv



uv



1113954Cuv

Comp fOp CuvObj C


1 times C

uvComp C


0 times CuvComp C


⎧⎨

⎩

(4)


















1113954Cuv

Comp fOp CuvObj C





⎧⎨

⎩ k 1 2 NComp(6)




CuvComp notin C




CS3 CS4 CS5CS1 CS2


PPM


Object classifier

Pixel accuracyPA

Ups

ampl

e


Objectsegmentation





Inputimage



Depth dmain = 50


Componentclassifier

Validcomponent

label

Object classifier


Featuremap

Object label Cuv

Comp label Cuv


Object segmentation


CompfO (Cuv

CuvObj

Obj

Comp)CompC uv




Backbonedepth dmain

Decoderηdecoder




1 FCN [30] ResNet 50 FCN 7132 4081 333





1113954Cuv

Comp fOp CuvObj C


CObj


1113954pCompminusk

1113936


CompminusObj

j1pObjminusj


1 minus 1113936


CompminusObj

j1pObjminusj


⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)



CuvComp notin C

CObjCompminusObj






CuvComp notin C


respectively

















Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References




































1113954Cuv

Comp fOp CuvObj C





⎧⎨

⎩ k 1 2 NComp(6)




CuvComp notin C




CS3 CS4 CS5CS1 CS2


PPM


Object classifier

Pixel accuracyPA

Ups

ampl

e


Objectsegmentation





Inputimage



Depth dmain = 50


Componentclassifier

Validcomponent

label

Object classifier


Featuremap

Object label Cuv

Comp label Cuv


Object segmentation


CompfO (Cuv

CuvObj

Obj

Comp)CompC uv




Backbonedepth dmain

Decoderηdecoder




1 FCN [30] ResNet 50 FCN 7132 4081 333





1113954Cuv

Comp fOp CuvObj C


CObj


1113954pCompminusk

1113936


CompminusObj

j1pObjminusj


1 minus 1113936


CompminusObj

j1pObjminusj


⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)



CuvComp notin C

CObjCompminusObj






CuvComp notin C


respectively

















Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References




































1113954Cuv

Comp fOp CuvObj C


CObj


1113954pCompminusk

1113936


CompminusObj

j1pObjminusj


1 minus 1113936


CompminusObj

j1pObjminusj


⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

j 1 2 Nobj

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(7)



CuvComp notin C

CObjCompminusObj






CuvComp notin C


respectively

















Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References




































Object segmentation

11Feature

map

Object classifier

Componentclassifier

Comp responsepuv


Validcomplabel

Validcomp

response

Comp analysismodel

fOp (Cuv Cuv

Obj

Obj

Obj

Comp) puv Comp C uv

Comp

Object label Cuv

Object responsepuv


Comp

(a)

11Feature

map

Object classifier

Componentclassifier

Object segmentation



ℂCObjComp-Obj

ObjObject label Cuv

CompComp label Cuv

Validcomplabel

Comp analysismodel


fO (Cuv Cuv

Obj Comp) C uv Comp

(b)


Object segmentation

11Feature

map

Object classifier

Componentclassifier



Validcomp

response

Object responsepuv


Comp analysismodel

fOp (Cuv Cuv


Comp

ObjObject label Cuv

Obj

Comppuv

ℂCObjComp-Obj

(c)







Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)






Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References







































Car

Car



Car

Car


Car

CarCar

PersonPerson


Car

CarCar

PersonPerson

CarCar

CarCar

Person

PersonPerson

PersonCarCar

CarPerson

PersonPerson

Person




Table 2 Continued

Backboneηmain

Backbonedepth dmain


()Segmentation time

Tseg (ms)



CObj








4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References




































4 Conclusions






of CuvComp notin C





DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

DenominationPattern


ChineseChinese

ChineseChinese

Denomination

Denomination

Watermark

WatermarkWatermark


Serial number

Pattern

Watermark


SafelineChinese

OVMI

Safeline

OVMI

Pattern

PatternDenomination

Serial number

Denomination

Watermark

Watermark Watermark

SafelineSafeline

Watermark

Watermark Watermark

SafelineSafeline

Watermark

WatermarkWatermark

SafelineSafeline







CObjCompminusObj

9529 100 496






Data Availability




Acknowledgments


References








































Data Availability




Acknowledgments


References




































SemanticSegmentationunderaComplexBackgroundforMachine ... · 2020. 5. 2. · analysis module to...

Documents

Transcript of SemanticSegmentationunderaComplexBackgroundforMachine ... · 2020. 5. 2. · analysis module to...