Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided...

11
1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi, Member, IEEE and Vishal M. Patel, Senior Member, IEEE Abstract—We propose a novel multi-stream architecture and training methodology that exploits semantic labels for facial image deblurring. The proposed Uncertainty Guided Multi- Stream Semantic Network (UMSN) processes regions belonging to each semantic class independently and learns to combine their outputs into the final deblurred result. Pixel-wise semantic labels are obtained using a segmentation network. A predicted confidence measure is used during training to guide the net- work towards challenging regions of the human face such as the eyes and nose. The entire network is trained in an end- to-end fashion. Comprehensive experiments on three different face datasets demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art face deblurring methods. Code is available at: https://github.com/ rajeevyasarla/UMSN-Face-Deblurring Index Terms—Facial image deblurring, semantic masks, con- fidence scores. I. I NTRODUCTION Image deblurring entails the recovery of an unknown true image from a blurry image. Similar to the other image enhancement tasks, image deblurring is experiencing a re- naissance as a result of convolutional networks (CNNs) es- tablishing themselves as powerful generative models. Image deblurring is an ill-posed problem and therefore it is crucial to leverage additional properties of the data to successfully recover the lost facial details in the deblurred image. Priors such as sparsity [1], [2], low-rank [3] and patch similarity [4] have been used in the literature to obtain a regularized solution. In recent years, deep learning-based methods have also gained some traction [5], [6], [7], [8]. The inherent semantic structure of natural images such as faces is an important information that can be exploited to improve the deblurring results. Few techniques [10], [9] make use of such prior information in the form of semantic labels. These methods do not account for the class imbalance of semantic maps corresponding to faces. Interior parts of a face like eyes, nose, and mouth are less represented as compared to face skin, hair and background labels. Depending on the pose of the face, some of the interior parts may even disappear. Without re-weighting the importance of less represented semantic regions, the method proposed by Ziyi Rajeev Yasarla is with the Whiting School of Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218-2608, e-mail: [email protected] Federico Perazzi is with Adobe Inc., e-mail: [email protected] Vishal M. Patel is with the Whiting School of Engineering, Johns Hopkins University, e-mail: [email protected] Manuscript received... (a) (b) (c) (d) Fig. 1: Sample deblurring results: (a) Blurry image; (b) results corresponding to Ziyi et al. [9] and to Song et al.[10] (last row); (c) Results corresponding to our proposed Uncertainty Guided Multi-Stream Semantic Network (UMSN); (d) ground- truth. Our approach recovers more details and better preserves fine structures like eyes and hair. et al.[9] fails to reconstruct the eyes and the mouth regions as shown in Fig. 1. Similar observations can also be made regarding the method proposed in [10] (Fig. 1) which uses semantic priors to obtain the intermediate outputs and then performs post-processing using k-nearest neighbor algorithm. To address the imbalance of different semantic classes, we propose a novel CNN architecture, called Uncertainty guided Multi-stream Semantic Networks (UMSN), which learns class- specific features independently and combine them to deblur the whole face image. Class-specific features are learned by sub- networks trained to reconstruct a single semantic class. We use nested residual learning paths to improve the propagation of semantic features. Additionally, we propose a class-based confidence measure to train the network. The confidence measure describes how well the network is likely to deblur each semantic class. This measure is incorporated in the loss to train the network. We evaluate the proposed network by conducting experiments on three face datasets - Helen [11], CelebA [12] and PuBFig [10]. Extensive experiments demonstrate the effectiveness of our approach compared to a number of well established face deblurring methods as well as other competitive approaches. Fig. 1 shows sample results arXiv:1907.13106v1 [cs.CV] 30 Jul 2019

Transcript of Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided...

Page 1: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

1

Deblurring Face Images using Uncertainty GuidedMulti-Stream Semantic Networks

Rajeev Yasarla Student Member IEEE Federico Perazzi Member IEEE and Vishal M Patel SeniorMember IEEE

AbstractmdashWe propose a novel multi-stream architecture andtraining methodology that exploits semantic labels for facialimage deblurring The proposed Uncertainty Guided Multi-Stream Semantic Network (UMSN) processes regions belongingto each semantic class independently and learns to combinetheir outputs into the final deblurred result Pixel-wise semanticlabels are obtained using a segmentation network A predictedconfidence measure is used during training to guide the net-work towards challenging regions of the human face such asthe eyes and nose The entire network is trained in an end-to-end fashion Comprehensive experiments on three differentface datasets demonstrate that the proposed method achievessignificant improvements over the recent state-of-the-art facedeblurring methods Code is available at httpsgithubcomrajeevyasarlaUMSN-Face-Deblurring

Index TermsmdashFacial image deblurring semantic masks con-fidence scores

I INTRODUCTION

Image deblurring entails the recovery of an unknown trueimage from a blurry image Similar to the other imageenhancement tasks image deblurring is experiencing a re-naissance as a result of convolutional networks (CNNs) es-tablishing themselves as powerful generative models Imagedeblurring is an ill-posed problem and therefore it is crucialto leverage additional properties of the data to successfullyrecover the lost facial details in the deblurred image Priorssuch as sparsity [1] [2] low-rank [3] and patch similarity [4]have been used in the literature to obtain a regularized solutionIn recent years deep learning-based methods have also gainedsome traction [5] [6] [7] [8]

The inherent semantic structure of natural images suchas faces is an important information that can be exploitedto improve the deblurring results Few techniques [10] [9]make use of such prior information in the form of semanticlabels These methods do not account for the class imbalanceof semantic maps corresponding to faces Interior parts ofa face like eyes nose and mouth are less represented ascompared to face skin hair and background labels Dependingon the pose of the face some of the interior parts mayeven disappear Without re-weighting the importance of lessrepresented semantic regions the method proposed by Ziyi

Rajeev Yasarla is with the Whiting School of Engineering Johns HopkinsUniversity 3400 North Charles Street Baltimore MD 21218-2608 e-mailryasarl1jhuedu

Federico Perazzi is with Adobe Inc e-mail perazziadobecomVishal M Patel is with the Whiting School of Engineering Johns Hopkins

University e-mail vpatel36jhueduManuscript received

(a) (b) (c) (d)

Fig 1 Sample deblurring results (a) Blurry image (b) resultscorresponding to Ziyi et al [9] and to Song et al[10] (lastrow) (c) Results corresponding to our proposed UncertaintyGuided Multi-Stream Semantic Network (UMSN) (d) ground-truth Our approach recovers more details and better preservesfine structures like eyes and hair

et al[9] fails to reconstruct the eyes and the mouth regionsas shown in Fig 1 Similar observations can also be maderegarding the method proposed in [10] (Fig 1) which usessemantic priors to obtain the intermediate outputs and thenperforms post-processing using k-nearest neighbor algorithm

To address the imbalance of different semantic classes wepropose a novel CNN architecture called Uncertainty guidedMulti-stream Semantic Networks (UMSN) which learns class-specific features independently and combine them to deblur thewhole face image Class-specific features are learned by sub-networks trained to reconstruct a single semantic class Weuse nested residual learning paths to improve the propagationof semantic features Additionally we propose a class-basedconfidence measure to train the network The confidencemeasure describes how well the network is likely to deblureach semantic class This measure is incorporated in theloss to train the network We evaluate the proposed networkby conducting experiments on three face datasets - Helen[11] CelebA [12] and PuBFig [10] Extensive experimentsdemonstrate the effectiveness of our approach compared to anumber of well established face deblurring methods as wellas other competitive approaches Fig 1 shows sample results

arX

iv1

907

1310

6v1

[cs

CV

] 3

0 Ju

l 201

9

2

from our UMSN network where one can clearly see thatUMSN is able to provide better results as compared to thestate-of-the-art techniques [9] [10] An ablation study is alsoconducted to demonstrate the effectiveness of different partsof the proposed network To summarize this paper makes thefollowing contributions

bull A novel multi-stream architecture called UncertaintyGuided Multi-Stream Semantic (UMSN) is proposedwhich learns class-specific features and uses them toreconstruct the final deblurred image

bull We propose a novel method of computing confidencemeasure for the reconstruction of every class in thedeblurred image which is to rebalance the importanceof semantic classes during training

Rest of the paper is organized as follows In Section II wereview a few related works Details of the proposed methodare given in Section III Experimental results are presented inSection IV and finally Section V concludes the paper with abrief summary

II BACKGROUND AND RELATED WORK

Single image deblurring techniques can be divided intotwo main categories - blind and non-blind In the non-blinddeblurring problem the blur kernel is assumed to be knownwhile in the blind deblurring problem the blur kernel needsto be estimated Since the proposed approach does not assumeany knowledge of the blur kernel it is a blind image deblurringapproach Hence in this section we review some recent blindimage deblurring techniques proposed in the literature

Classical image deblurring methods estimate the blur kernelgiven a blurry image and then apply deconvolution to get thedeblurred image To calculate the blur kernel some techniquesassume prior information about the image and formulatemaximum a-posertior (MAP) to obtain the deblurred imageDifferent priors such sparsity L0 gradient prior patch priormanifold Prior and low-rank have been proposed to obtainregularized reconstructions [13] [1] [14] [3] [2] [15] [4][16] [17] Chakrabarti et al[6] estimate the Fourier coeffi-cients of the blur kernel to deblur the image Nimisha et al[5]learn the latent features of the blurry images using the latentfeatures of the clean images to estimate the deblurred imageThese CNN-based methods do not perform well comparedto the state-of-the-art MAP-based methods for large motionkernels Recent non-blind image deblurring methods like [18][19] [2] [20] [21] assume and use some knowledge about theblur kernel Given the latent blurry input Subeesh et al[21]estimate multiple latent images corresponding to different priorstrengths to estimate the final deblurred image CNN-basedtechniques like [22] and [23] estimate the blur kernel andaddress dynamic deblurring

The usage of semantic information for image restoration isrelatively unexplored While semantic information has beenused for different object classes [24] [25] [26] [27] asubstantial body of literature has focused their attention tohuman faces [28] [9] [29] [30] Pan et al [29] extract theedges of face parts and estimate exemplar face images whichare further used as the global prior to estimate the blur kernel

This approach is complex and computationally expensive inestimating the blur kernel Recently Ziyi et al [9] proposedto use the semantic maps of a face to deblur the imageFurthermore they introduced the content loss to improve thequality of eyes nose and mouth regions of the face In contrastto these methods we learn a multi-stream network whichreconstructs the deblurred images corresponding to differentclasses in a facial semantic map Furthermore we propose anew loss to train the network

III PROPOSED METHOD

A blurry face image y can be modeled as the convolutionof a clean image x with a blur kernel k as

y = k lowast x+ η

where lowast denotes the convolution operation and η isnoise Given y in blind deblurring our objective isto estimate the underlying clean face image x Wegroup 11 semantic face labels into 4 classes as followsm1 = background m2 = face skin m3 =left eyebrow right eyebrow left eye right eyenose upper lip lower lip teeth and m4 = hairThus the semantic class mask of a clean image x is the unionm = m1 cupm2 cupm3 cupm4 Similarly we define the semanticclass masks of a blurry image m

Semantic class masks for blurry image m are generatedusing the semantic segmentation network (SN) (Fig 3) andgiven together with the blurry image as input to the deblurringnetwork UMSN This is important in face deblurring assome parts like face skin and hair are easy to reconstructwhile face parts like eyes nose and mouth are difficult toreconstruct and require special attention while deblurring aface image This is mainly due to the fact that parts like eyesnose and mouth are small in size and contain high frequencyelements compared to the other components Different from[29] that uses edge information and [9] that feed the semanticmap to a single-stream deblurring network we address thisproblem by proposing a multi-stream semantic network inwhich individual branches F-net-i learn to reconstruct differentparts of the face image separately Fig 2 gives an overviewof the proposed UMSN method

As can be seen from Fig 2 the proposed UMSN networkconsists of two stages We generate the semantic class mapsm of a blurry face image using the SN network The semanticmaps are used as the global masks to guide each stream of thefirst stage network These semantic class maps m are furtherused to learn class specific residual feature maps with nestedresidual learning paths (NRL) In the first stage of our networkthe weights are learned to deblur the corresponding class ofthe face image In the second stage of the network the outputsfrom the first stage are fused to learn the residual maps that areadded to the blurry image to obtain the final deblurred imageWe train the proposed network with a confidence guided class-based loss

A Semantic Segmentation Network (SN)The semantic class maps mi of a face are extracted using

the SN network as shown in Fig 3 We use residual blocks

3

Fig 2 An overview of the proposed UMSN network First stage semantic networks consist of F-Net-i Second stage isconstructed using the base network (BN) where outputs of the F-Net-irsquos are concatenated with the output of the first ResBlocklayer in BN

(ResBlock) as our building module for the segmentationnetwork A ResBlock consist of a 1 times 1 convolution layera 3 times 3 convolution layer and two 3 times 3 convolution layerswith dilation factor of 2 as shown in Fig 4

B Base Network (BN)

We construct our base network using a combination ofUNet [31] and DenseNet [32] architectures with the ResBlockas our basic building block To increase the receptive field sizewe introduce smoothed dilation convolutions in the ResBlockas shown in Fig 4 BN is a sequence of eight ResBlockssimilar to the first stage semantic network as shown in Fig 5Note that all convolutional layers are densely connected [32]We follow residual-based learning in estimating the deblurredimage for our base network as shown in Fig 5

Fig 3 An overview of the segmentation networkConv l times l (p q) contains instance normalization [33]Rectified Linear Unit (ReLU) Conv (l times l) - convolutionallayer with kernel of size ltimes l where p and q are the numberof input and output channels respectively

Fig 4 An overview of the ResBlock Conv ltimes l(p q) containsInstance Normalization [33] ReLU - Rectified Linear UnitsConv (l times l) - convolutional layer with kernel of size l timesl where p and q are number of input and output channelsrespectively In the right side of the figure we show smootheddilation convolutions introduced in ResBlock which is similarto [34]

C UMSN Network

The UMSN network is a two-stage network The first stagenetwork is designed to obtain deblurred outputs from thesemantic class-wise blurry inputs These outputs are furtherprocessed by the second stage network to obtain the finaldeblurred image The first stage semantic network containsa sequence of five ResBlocks with residual connections asshown in Fig 5 We call the set of all convolution layers of thefirst stage network excluding the last ResBlock and Conv3times3as F-Net

The blurry image y and the semantic masks mi are fed to F-Net-i to obtain the corresponding class-specific deblurred fea-tures which are concatenated with the output of the first layer(ResBlock-Avgpool) in Base Network(BN) for constructingthe UMSN network We also use the Nested Residual Learning(NRL) in UMSN network where class specific residual featuremaps are learned and further used in estimating the residual

4

Fig 5 An overview of the first stage semantic network (F-Net)

feature maps that are added to the blurry image for obtainingthe final output For example we can observe a residualconnection between the last layer of UMSN and class-specificfeature maps obtained from Conv 1times1 using y and m Outputof this residual connection is further processed as the input tothe residual connection with the input blurry image y In thisway we define our NRL and obtain the final deblurred imageWe propose a class-based loss function to train the UMSNnetwork

D Loss for UMSN

The network parameters Θ are learned by minimizing a lossL as follows

Θ = argminΘ

L(fΘ(y m) x) = argminΘ

L(x x) (1)

where fΘ() represents the UMSN network x is the deblurredresult m is the semantic map obtained from SN We definethe reconstruction loss as L = x minus x1 A face image canbe expressed as the sum of masked images using the semanticmaps as

x =

Msumi=1

mi x

where is the element-wise multiplication and M is the totalnumber of semantic maps As the masks are independent ofone another Eq (1) can be re-written as

Θ = argminΘ

Msumi=1

L(mi xmi x) (2)

In other words the loss is calculated for every class indepen-dent and summed up in order to obtain the overall loss asfollows

L(x x) =

Msumi=1

L(mi xmi x) (3)

Uncertainty Guidance

We introduce a confidence measure for every class anduse it to re-weight the contribution of the loss from eachclass to the total loss By introducing a confidence measureand re-weighting the loss we benefit in two ways If thenetwork is giving less importance to a particular class bynot learning appropriate features of it then the ConfidenceNetwork (CN) helps UMSN to learn those class featuresby estimating low confidence values and higher gradients

for those classes through CN network Additionally by re-weighting the contribution of loss from each class it countersfor the imbalances in the error estimation from differentclasses The loss function can be written as

Lc(x x) =

Msumi=1

CiL(mi xmi x)minus λ log(Ci) (4)

where log(Ci) acts as a regularizer that prevents the valueof Ci going to zero and λ is a constant We estimate theconfidence measure Ci for each class by passing mi xmix as inputs to CN as shown in Fig 6 Ci represents howconfident UMSN is in deblurring the ith class components ofthe face image Note that Ci(isin [0 1]) confidence measure isused only in the loss function while training the weights ofUMSN and it is not used (or estimated) during inference

Fig 6 An overview of the Confidence Network (CN) xis ground truth image x is deblurred image obtained fromUMSN m semantic maps of x

Inspired by the benefits of the perceptual loss in styletransfer [35] [36] and image super-resolution [37] we use it totrain our network Let Φ() denote the features obtained usingthe VGG16 model [38] then the perceptual loss is defined asfollows

Lp = Φ(x)minus Φ(x)22 (5)

The features from layer relu1 2 of a pretrained VGG-16network [38] are used to compute the perceptual loss Thetotal loss used to train UMSN is as follows

Ltotal = Lc + λ1Lp (6)

where λ1 is a constant

IV EXPERIMENTAL RESULTS

We evaluate the results using the images provided by theauthors of [9] which consists of 8000 blurry images gener-ated using the Helen dataset [11] and 8000 blurry imagesgenerated using the CelebA dataset [12]1 Furthermore wetest our network on a test dataset called PubFig providedby the authors of [10] which contains 192 blurry imagesWe quantitatively evaluate the results using the Peak-Signal-to-Noise Ratio (PSNR) and the Structural Similarity index(SSIM) [39]

1We follow the same training and testing protocols defined by [9]

5

Methodclass-1 class-2 class-3 class-4

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

CZiyi et al 2786 088 3027 093 3333 092 3179 091

UMSN 2927 091 3075 095 3383 096 3310 094

HZiyi et al 2913 085 3120 088 3513 092 3396 088

UMSN 3083 087 3176 090 3567 093 3481 091

TABLE I PSNR and SSIM values for each deblurred semanticclass on the CelebA dataset (C) and Hellen dataset (H)

A Implementation Details

1) Segmentation Network We initially train the SN net-work using the Helen dataset [11] which contains 2000 cleanimages with corresponding semantic labels provided by Smithet al[40] SN is trained using the cross entropy loss with theAdam optimizer and the learning rate is set equal to 00002We train SN for 60000 iterations on clean images As can beseen from Fig 7 and Table II SN trained on clean images onlyis not able to produce good results on the blurry images Thuswe fine-tune SN using the blurry images and the correspondingsemantic maps generated using the Helen dataset [11] In thiscase SN is fine-tuned for 30000 iterations with the learningrate of 000001 We test our SN network on 330 clean Testimages provided by the Helen dataset as well as 8000 blurryimages Results are shown in Fig 7 and Table II in terms ofF-scores As can be seen from these results our method isable to produce reasonable segmentation results from blurryimages

Class Clean Images Blurry ImagesSN trained onclean images

SN trained onclean images

SN fine-tuned wblurry images

class-1 0910 0889 0901class-2 0903 0877 0884class-3 0856 0775 0814class-4 0625 0538 0566

TABLE II F-score values of semantic masks for clean andblurry images from the Helen dataset

2) UMSN Network Training images for UMSN are gen-erated using 2000 images from the Helen dataset [11] andrandomly selected 25000 images from the CelebA dataset [12]We generate 25000 blur kernels sizes ranging from 13 times 13to 29times 29 using 3D camera trajectories [41] Patches of size128times 128 are exacted from those images and convolved with25000 blur kernels randomly to generated about 17 millionpairs of clean-blurry data We added Gaussian noise withσ = 003 to the blurry images

a) First Stage Semantic Network Every F-Net in the firststage network is trained with clean-blurry paired data whereonly the corresponding class components of the face imagesare blurred The first stage semantic network is trained with acombination of the L1-norm and the perceptual losses usingthe Adam optimizer with learning rate of 00002 The firststage network is trained for 50 000 iterations To illustratewhat kind of outputs are observed from the F-Nets in Fig 8we display semantic masks (first row) the correspondingblurry images (second row) and the final estimated deblurred

(a) (b) (c)

Fig 7 Semantic maps generated by SN on a blurry imagefrom the Helen dataset (a) Blurry image (b) Semantic masksobtained from SN trained on clean images (c) Semantic masksobtained from SN fine tuned with the blurry images

outputs from F-Nets (last row) As it can be seen from thelast row F-Nets are able to remove most of the blur fromcorresponding class in the image

(a) (b) (c) (d)

Fig 8 First row semantic masks Second row blurry imagesThird row deblurred images using F-Net-i for classes (a) m1(b) m2 (c) m3 (d) m4

b) Overall UMSN Network UMSN is trained usingLtotal with the Adam optimizer and batchsize of 16 Thelearning rate is set equal to 00002 Note that semanticmaps mi4i=1 of the ground truth clean images in Ltotal aregenerated using SN since we do not have the actual semanticmaps for them λ and λ1 values are set equal to 001 and 10respectively UMSN is trained for 01 million iterations

B Evaluation

a) PSNR and SSIM We compare the performance of ourmethod with the following state-of-the-art algorithms MAP-based methods [14] [13] [42] [43] [44] face deblurringexemplar-based method [29] and CNN-based methods [7]

6

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMKrishnan et al[14] (CVPRrsquo11) 1930 0670 1838 0672Pan et al[29] (ECCV 2014) 2093 0727 1859 0677Shan et al[42] (SIGGRAPHrsquo08) 1957 0670 1843 0644Xu et al[13] (CVPRrsquo13) 2011 0711 1893 0685Cho et al[43] (SIGGRAPHrsquo09) 1682 0574 1303 0445Zhong et al[44] (CVPRrsquo13) 1641 0614 1726 0695Nah et al[7] (CVPRrsquo17) 2412 0823 2243 0832Ziyi et al[9] (CVPRrsquo18) wGAN 2558 0861 2434 0860Ziyi et al[9] (CVPRrsquo18) 2599 0871 2505 0879UMSN (ours wLtotal) 2693 0897 2590 0906

TABLE III PSNR and SSIM comparision of UMSN againststate-of-the-art methods

[9] Results are shown in Table III As can be seen from thistable UMSN outperforms state-of-the-art methods includingthe methods that make use of semantic maps for image de-blurring [9] Furthermore we evaluate UMSNrsquos performancein reconstructing individual semantic classes against [9] Asit can be seen from the Table I our methodrsquos performance inreconstructing individual classes is better than the sate-of-the-art method [9] In addition we compare the performance ofour method against CNN-based face deblurring method thatuses post processing [10] on the PubFig dataset consisting of192 images released by the authors of [10] The PSNR andSSIM values achieved by [10] are 3021 and 084 whereasour method achieved 3051 and 086 respectively

b) VGG-Face distance We compare the performanceof our method with the following state-of-the-art algorithmsusing Feature distance( ie L2minusnorm distance between outputfeature map from network) between deblurred image and theground truth image We use the outputs of pool5 layer from theVGG-Face [45] to compute the VGG-Face distance( dV GG)dV GG is a better perceptual metric to compare as it computesthe L2minusnorm distance in the feature space using VGG-Face [45] network Table IV clearly shows that our methodUMSN out-performs the existing state of the art methods usingthe perceptual measures like dV GG

Deblurring MethodHelen CelebAdV GG dV GG

Xu et al[13] (CVPRrsquo13) 1015 1138Zhong et al[44] (CVPRrsquo13) 1228 1472Nah et al[7] (CVPRrsquo17) 746 892Ziyi et al[9] (CVPRrsquo18) 587 649UMSN (ours wLtotal) 523 557

TABLE IV comparision of UMSN against state-of-the-artmethods using distance of feature from VGG-Face( dV GG)lower is better

c) Face recognition In order to show the significance ofdifferent face deblurring methods we perform face recognitionon the deblurred images We use the CelebA dataset to performthis comparison where we use 100 different identities asthe probe set and for each identity we select 9 additional

clean face images as the gallery set 8000 blurry imagesare generated using the 100 images in the probe set Givena blurry image the deblurred image is computed using thecorresponding deblurring algorithm The most similar face forthis deblurred image is selected from the gallery set to checkwhether they belong to same identity or not

To perform face recognition we use OpenFace toolbox [46]to compute the similarity distance between the deblurredimage and all the gallery images We select the Top-K nearestmatches for each deblurred image to compute the accuracyof the deblurring method Table V clearly shows that thedeblurred images by our method have better recognition accu-racies than the other state-of-the-art methods This experimentclearly shows that our method is able to retain the importantparts of a face while performing deblurring This in turn helpsin achieving better face recognition compared to the othermethods

Deblurring Method Top-1 Top-3 Top-5Clear images 71 84 89Blurred images 31 46 53Pan et al[29] (ECCV 2014) 44 57 64Xu et al[13] (CVPRrsquo13) 43 57 64Zhong et al[44] (CVPRrsquo13) 30 44 51Nah et al[7] (CVPRrsquo17) 42 59 65Ziyi et al[9] (CVPRrsquo18) 54 68 74UMSN (ours wLtotal) 59 72 80

TABLE V Top-1 Top-3 and Top-5 face recognition accuracieson the CelebA dataset

d) Qualitative Results The qualitative performance ofdifferent methods on several images from the Helen andCelebA test images are shown in Fig 9 We can clearly ob-serve that UMSN produces sharp images without any artifactswhen compared to the other methods [13] [9] [44]

In addition we conducted experiments on 100 real blurryimages provided by the authors of [13] [9] [44] Resultscorresponding to different methods are shown in Fig 10 Ascan be seen from this figure UMSN produces sharper andclear images when compared to the state-of-the-art methodsFor example methods [13] [44] produce results which containartifacts or blurry images Ziyi et al[9] is not able reconstructeyes nose and mouth as shown in the fourth row of Fig 10On the other hand eyes nose and mouth regions are clearlyvisible in the images corresponding to the UMSN method

C Ablation Study

We conduct experiments on two datasets to study theperformance contribution of each component in UMSN Westart with the base network (BN) and progressively addcomponents to establish their significance to estimate thefinal deblurred image The corresponding results are shownin Table VI It is important to note that BN and UMSNare configured to have the same number of learnable pa-rameters by setting appropriate number of output channelsin the intermediate convolutional layers of the ResBlock

7

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] Ours Ground TruthImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final Image

Fig 9 Sample results from the Helen and CelebA datasets As can be seen from this figure MAP-based methods [44] [13]are adding some artifacts on the final deblurred images especially near the eyes nose and mouth regions Global prior-basedmethod [9] produces very smooth outputs and some parts of the deblurred images are deformed This can be clearly seenby looking at the reconstructions in the fourth column corresponding to [9] where the right eye is added in the face imagethe fingers are removed the mustache is removed and the mouth is deformed On the other hand by using class-specificmulti-stream networks and Ltotal our method is able to reconstruct all the classes of a face perfectly and produces sharperface images For example UMSN is able to reconstruct eyes nose mustache and mouth clearly Additionally UMSN doesnot add parts like eyes when they are occluded (last row) Note that the method proposed in [9] uses a generative adversarialnetwork (GAN) to reconstruct sharper images In contrast our method is able to reconstruct sharper images even without usinga GAN

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 2: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

2

from our UMSN network where one can clearly see thatUMSN is able to provide better results as compared to thestate-of-the-art techniques [9] [10] An ablation study is alsoconducted to demonstrate the effectiveness of different partsof the proposed network To summarize this paper makes thefollowing contributions

bull A novel multi-stream architecture called UncertaintyGuided Multi-Stream Semantic (UMSN) is proposedwhich learns class-specific features and uses them toreconstruct the final deblurred image

bull We propose a novel method of computing confidencemeasure for the reconstruction of every class in thedeblurred image which is to rebalance the importanceof semantic classes during training

Rest of the paper is organized as follows In Section II wereview a few related works Details of the proposed methodare given in Section III Experimental results are presented inSection IV and finally Section V concludes the paper with abrief summary

II BACKGROUND AND RELATED WORK

Single image deblurring techniques can be divided intotwo main categories - blind and non-blind In the non-blinddeblurring problem the blur kernel is assumed to be knownwhile in the blind deblurring problem the blur kernel needsto be estimated Since the proposed approach does not assumeany knowledge of the blur kernel it is a blind image deblurringapproach Hence in this section we review some recent blindimage deblurring techniques proposed in the literature

Classical image deblurring methods estimate the blur kernelgiven a blurry image and then apply deconvolution to get thedeblurred image To calculate the blur kernel some techniquesassume prior information about the image and formulatemaximum a-posertior (MAP) to obtain the deblurred imageDifferent priors such sparsity L0 gradient prior patch priormanifold Prior and low-rank have been proposed to obtainregularized reconstructions [13] [1] [14] [3] [2] [15] [4][16] [17] Chakrabarti et al[6] estimate the Fourier coeffi-cients of the blur kernel to deblur the image Nimisha et al[5]learn the latent features of the blurry images using the latentfeatures of the clean images to estimate the deblurred imageThese CNN-based methods do not perform well comparedto the state-of-the-art MAP-based methods for large motionkernels Recent non-blind image deblurring methods like [18][19] [2] [20] [21] assume and use some knowledge about theblur kernel Given the latent blurry input Subeesh et al[21]estimate multiple latent images corresponding to different priorstrengths to estimate the final deblurred image CNN-basedtechniques like [22] and [23] estimate the blur kernel andaddress dynamic deblurring

The usage of semantic information for image restoration isrelatively unexplored While semantic information has beenused for different object classes [24] [25] [26] [27] asubstantial body of literature has focused their attention tohuman faces [28] [9] [29] [30] Pan et al [29] extract theedges of face parts and estimate exemplar face images whichare further used as the global prior to estimate the blur kernel

This approach is complex and computationally expensive inestimating the blur kernel Recently Ziyi et al [9] proposedto use the semantic maps of a face to deblur the imageFurthermore they introduced the content loss to improve thequality of eyes nose and mouth regions of the face In contrastto these methods we learn a multi-stream network whichreconstructs the deblurred images corresponding to differentclasses in a facial semantic map Furthermore we propose anew loss to train the network

III PROPOSED METHOD

A blurry face image y can be modeled as the convolutionof a clean image x with a blur kernel k as

y = k lowast x+ η

where lowast denotes the convolution operation and η isnoise Given y in blind deblurring our objective isto estimate the underlying clean face image x Wegroup 11 semantic face labels into 4 classes as followsm1 = background m2 = face skin m3 =left eyebrow right eyebrow left eye right eyenose upper lip lower lip teeth and m4 = hairThus the semantic class mask of a clean image x is the unionm = m1 cupm2 cupm3 cupm4 Similarly we define the semanticclass masks of a blurry image m

Semantic class masks for blurry image m are generatedusing the semantic segmentation network (SN) (Fig 3) andgiven together with the blurry image as input to the deblurringnetwork UMSN This is important in face deblurring assome parts like face skin and hair are easy to reconstructwhile face parts like eyes nose and mouth are difficult toreconstruct and require special attention while deblurring aface image This is mainly due to the fact that parts like eyesnose and mouth are small in size and contain high frequencyelements compared to the other components Different from[29] that uses edge information and [9] that feed the semanticmap to a single-stream deblurring network we address thisproblem by proposing a multi-stream semantic network inwhich individual branches F-net-i learn to reconstruct differentparts of the face image separately Fig 2 gives an overviewof the proposed UMSN method

As can be seen from Fig 2 the proposed UMSN networkconsists of two stages We generate the semantic class mapsm of a blurry face image using the SN network The semanticmaps are used as the global masks to guide each stream of thefirst stage network These semantic class maps m are furtherused to learn class specific residual feature maps with nestedresidual learning paths (NRL) In the first stage of our networkthe weights are learned to deblur the corresponding class ofthe face image In the second stage of the network the outputsfrom the first stage are fused to learn the residual maps that areadded to the blurry image to obtain the final deblurred imageWe train the proposed network with a confidence guided class-based loss

A Semantic Segmentation Network (SN)The semantic class maps mi of a face are extracted using

the SN network as shown in Fig 3 We use residual blocks

3

Fig 2 An overview of the proposed UMSN network First stage semantic networks consist of F-Net-i Second stage isconstructed using the base network (BN) where outputs of the F-Net-irsquos are concatenated with the output of the first ResBlocklayer in BN

(ResBlock) as our building module for the segmentationnetwork A ResBlock consist of a 1 times 1 convolution layera 3 times 3 convolution layer and two 3 times 3 convolution layerswith dilation factor of 2 as shown in Fig 4

B Base Network (BN)

We construct our base network using a combination ofUNet [31] and DenseNet [32] architectures with the ResBlockas our basic building block To increase the receptive field sizewe introduce smoothed dilation convolutions in the ResBlockas shown in Fig 4 BN is a sequence of eight ResBlockssimilar to the first stage semantic network as shown in Fig 5Note that all convolutional layers are densely connected [32]We follow residual-based learning in estimating the deblurredimage for our base network as shown in Fig 5

Fig 3 An overview of the segmentation networkConv l times l (p q) contains instance normalization [33]Rectified Linear Unit (ReLU) Conv (l times l) - convolutionallayer with kernel of size ltimes l where p and q are the numberof input and output channels respectively

Fig 4 An overview of the ResBlock Conv ltimes l(p q) containsInstance Normalization [33] ReLU - Rectified Linear UnitsConv (l times l) - convolutional layer with kernel of size l timesl where p and q are number of input and output channelsrespectively In the right side of the figure we show smootheddilation convolutions introduced in ResBlock which is similarto [34]

C UMSN Network

The UMSN network is a two-stage network The first stagenetwork is designed to obtain deblurred outputs from thesemantic class-wise blurry inputs These outputs are furtherprocessed by the second stage network to obtain the finaldeblurred image The first stage semantic network containsa sequence of five ResBlocks with residual connections asshown in Fig 5 We call the set of all convolution layers of thefirst stage network excluding the last ResBlock and Conv3times3as F-Net

The blurry image y and the semantic masks mi are fed to F-Net-i to obtain the corresponding class-specific deblurred fea-tures which are concatenated with the output of the first layer(ResBlock-Avgpool) in Base Network(BN) for constructingthe UMSN network We also use the Nested Residual Learning(NRL) in UMSN network where class specific residual featuremaps are learned and further used in estimating the residual

4

Fig 5 An overview of the first stage semantic network (F-Net)

feature maps that are added to the blurry image for obtainingthe final output For example we can observe a residualconnection between the last layer of UMSN and class-specificfeature maps obtained from Conv 1times1 using y and m Outputof this residual connection is further processed as the input tothe residual connection with the input blurry image y In thisway we define our NRL and obtain the final deblurred imageWe propose a class-based loss function to train the UMSNnetwork

D Loss for UMSN

The network parameters Θ are learned by minimizing a lossL as follows

Θ = argminΘ

L(fΘ(y m) x) = argminΘ

L(x x) (1)

where fΘ() represents the UMSN network x is the deblurredresult m is the semantic map obtained from SN We definethe reconstruction loss as L = x minus x1 A face image canbe expressed as the sum of masked images using the semanticmaps as

x =

Msumi=1

mi x

where is the element-wise multiplication and M is the totalnumber of semantic maps As the masks are independent ofone another Eq (1) can be re-written as

Θ = argminΘ

Msumi=1

L(mi xmi x) (2)

In other words the loss is calculated for every class indepen-dent and summed up in order to obtain the overall loss asfollows

L(x x) =

Msumi=1

L(mi xmi x) (3)

Uncertainty Guidance

We introduce a confidence measure for every class anduse it to re-weight the contribution of the loss from eachclass to the total loss By introducing a confidence measureand re-weighting the loss we benefit in two ways If thenetwork is giving less importance to a particular class bynot learning appropriate features of it then the ConfidenceNetwork (CN) helps UMSN to learn those class featuresby estimating low confidence values and higher gradients

for those classes through CN network Additionally by re-weighting the contribution of loss from each class it countersfor the imbalances in the error estimation from differentclasses The loss function can be written as

Lc(x x) =

Msumi=1

CiL(mi xmi x)minus λ log(Ci) (4)

where log(Ci) acts as a regularizer that prevents the valueof Ci going to zero and λ is a constant We estimate theconfidence measure Ci for each class by passing mi xmix as inputs to CN as shown in Fig 6 Ci represents howconfident UMSN is in deblurring the ith class components ofthe face image Note that Ci(isin [0 1]) confidence measure isused only in the loss function while training the weights ofUMSN and it is not used (or estimated) during inference

Fig 6 An overview of the Confidence Network (CN) xis ground truth image x is deblurred image obtained fromUMSN m semantic maps of x

Inspired by the benefits of the perceptual loss in styletransfer [35] [36] and image super-resolution [37] we use it totrain our network Let Φ() denote the features obtained usingthe VGG16 model [38] then the perceptual loss is defined asfollows

Lp = Φ(x)minus Φ(x)22 (5)

The features from layer relu1 2 of a pretrained VGG-16network [38] are used to compute the perceptual loss Thetotal loss used to train UMSN is as follows

Ltotal = Lc + λ1Lp (6)

where λ1 is a constant

IV EXPERIMENTAL RESULTS

We evaluate the results using the images provided by theauthors of [9] which consists of 8000 blurry images gener-ated using the Helen dataset [11] and 8000 blurry imagesgenerated using the CelebA dataset [12]1 Furthermore wetest our network on a test dataset called PubFig providedby the authors of [10] which contains 192 blurry imagesWe quantitatively evaluate the results using the Peak-Signal-to-Noise Ratio (PSNR) and the Structural Similarity index(SSIM) [39]

1We follow the same training and testing protocols defined by [9]

5

Methodclass-1 class-2 class-3 class-4

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

CZiyi et al 2786 088 3027 093 3333 092 3179 091

UMSN 2927 091 3075 095 3383 096 3310 094

HZiyi et al 2913 085 3120 088 3513 092 3396 088

UMSN 3083 087 3176 090 3567 093 3481 091

TABLE I PSNR and SSIM values for each deblurred semanticclass on the CelebA dataset (C) and Hellen dataset (H)

A Implementation Details

1) Segmentation Network We initially train the SN net-work using the Helen dataset [11] which contains 2000 cleanimages with corresponding semantic labels provided by Smithet al[40] SN is trained using the cross entropy loss with theAdam optimizer and the learning rate is set equal to 00002We train SN for 60000 iterations on clean images As can beseen from Fig 7 and Table II SN trained on clean images onlyis not able to produce good results on the blurry images Thuswe fine-tune SN using the blurry images and the correspondingsemantic maps generated using the Helen dataset [11] In thiscase SN is fine-tuned for 30000 iterations with the learningrate of 000001 We test our SN network on 330 clean Testimages provided by the Helen dataset as well as 8000 blurryimages Results are shown in Fig 7 and Table II in terms ofF-scores As can be seen from these results our method isable to produce reasonable segmentation results from blurryimages

Class Clean Images Blurry ImagesSN trained onclean images

SN trained onclean images

SN fine-tuned wblurry images

class-1 0910 0889 0901class-2 0903 0877 0884class-3 0856 0775 0814class-4 0625 0538 0566

TABLE II F-score values of semantic masks for clean andblurry images from the Helen dataset

2) UMSN Network Training images for UMSN are gen-erated using 2000 images from the Helen dataset [11] andrandomly selected 25000 images from the CelebA dataset [12]We generate 25000 blur kernels sizes ranging from 13 times 13to 29times 29 using 3D camera trajectories [41] Patches of size128times 128 are exacted from those images and convolved with25000 blur kernels randomly to generated about 17 millionpairs of clean-blurry data We added Gaussian noise withσ = 003 to the blurry images

a) First Stage Semantic Network Every F-Net in the firststage network is trained with clean-blurry paired data whereonly the corresponding class components of the face imagesare blurred The first stage semantic network is trained with acombination of the L1-norm and the perceptual losses usingthe Adam optimizer with learning rate of 00002 The firststage network is trained for 50 000 iterations To illustratewhat kind of outputs are observed from the F-Nets in Fig 8we display semantic masks (first row) the correspondingblurry images (second row) and the final estimated deblurred

(a) (b) (c)

Fig 7 Semantic maps generated by SN on a blurry imagefrom the Helen dataset (a) Blurry image (b) Semantic masksobtained from SN trained on clean images (c) Semantic masksobtained from SN fine tuned with the blurry images

outputs from F-Nets (last row) As it can be seen from thelast row F-Nets are able to remove most of the blur fromcorresponding class in the image

(a) (b) (c) (d)

Fig 8 First row semantic masks Second row blurry imagesThird row deblurred images using F-Net-i for classes (a) m1(b) m2 (c) m3 (d) m4

b) Overall UMSN Network UMSN is trained usingLtotal with the Adam optimizer and batchsize of 16 Thelearning rate is set equal to 00002 Note that semanticmaps mi4i=1 of the ground truth clean images in Ltotal aregenerated using SN since we do not have the actual semanticmaps for them λ and λ1 values are set equal to 001 and 10respectively UMSN is trained for 01 million iterations

B Evaluation

a) PSNR and SSIM We compare the performance of ourmethod with the following state-of-the-art algorithms MAP-based methods [14] [13] [42] [43] [44] face deblurringexemplar-based method [29] and CNN-based methods [7]

6

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMKrishnan et al[14] (CVPRrsquo11) 1930 0670 1838 0672Pan et al[29] (ECCV 2014) 2093 0727 1859 0677Shan et al[42] (SIGGRAPHrsquo08) 1957 0670 1843 0644Xu et al[13] (CVPRrsquo13) 2011 0711 1893 0685Cho et al[43] (SIGGRAPHrsquo09) 1682 0574 1303 0445Zhong et al[44] (CVPRrsquo13) 1641 0614 1726 0695Nah et al[7] (CVPRrsquo17) 2412 0823 2243 0832Ziyi et al[9] (CVPRrsquo18) wGAN 2558 0861 2434 0860Ziyi et al[9] (CVPRrsquo18) 2599 0871 2505 0879UMSN (ours wLtotal) 2693 0897 2590 0906

TABLE III PSNR and SSIM comparision of UMSN againststate-of-the-art methods

[9] Results are shown in Table III As can be seen from thistable UMSN outperforms state-of-the-art methods includingthe methods that make use of semantic maps for image de-blurring [9] Furthermore we evaluate UMSNrsquos performancein reconstructing individual semantic classes against [9] Asit can be seen from the Table I our methodrsquos performance inreconstructing individual classes is better than the sate-of-the-art method [9] In addition we compare the performance ofour method against CNN-based face deblurring method thatuses post processing [10] on the PubFig dataset consisting of192 images released by the authors of [10] The PSNR andSSIM values achieved by [10] are 3021 and 084 whereasour method achieved 3051 and 086 respectively

b) VGG-Face distance We compare the performanceof our method with the following state-of-the-art algorithmsusing Feature distance( ie L2minusnorm distance between outputfeature map from network) between deblurred image and theground truth image We use the outputs of pool5 layer from theVGG-Face [45] to compute the VGG-Face distance( dV GG)dV GG is a better perceptual metric to compare as it computesthe L2minusnorm distance in the feature space using VGG-Face [45] network Table IV clearly shows that our methodUMSN out-performs the existing state of the art methods usingthe perceptual measures like dV GG

Deblurring MethodHelen CelebAdV GG dV GG

Xu et al[13] (CVPRrsquo13) 1015 1138Zhong et al[44] (CVPRrsquo13) 1228 1472Nah et al[7] (CVPRrsquo17) 746 892Ziyi et al[9] (CVPRrsquo18) 587 649UMSN (ours wLtotal) 523 557

TABLE IV comparision of UMSN against state-of-the-artmethods using distance of feature from VGG-Face( dV GG)lower is better

c) Face recognition In order to show the significance ofdifferent face deblurring methods we perform face recognitionon the deblurred images We use the CelebA dataset to performthis comparison where we use 100 different identities asthe probe set and for each identity we select 9 additional

clean face images as the gallery set 8000 blurry imagesare generated using the 100 images in the probe set Givena blurry image the deblurred image is computed using thecorresponding deblurring algorithm The most similar face forthis deblurred image is selected from the gallery set to checkwhether they belong to same identity or not

To perform face recognition we use OpenFace toolbox [46]to compute the similarity distance between the deblurredimage and all the gallery images We select the Top-K nearestmatches for each deblurred image to compute the accuracyof the deblurring method Table V clearly shows that thedeblurred images by our method have better recognition accu-racies than the other state-of-the-art methods This experimentclearly shows that our method is able to retain the importantparts of a face while performing deblurring This in turn helpsin achieving better face recognition compared to the othermethods

Deblurring Method Top-1 Top-3 Top-5Clear images 71 84 89Blurred images 31 46 53Pan et al[29] (ECCV 2014) 44 57 64Xu et al[13] (CVPRrsquo13) 43 57 64Zhong et al[44] (CVPRrsquo13) 30 44 51Nah et al[7] (CVPRrsquo17) 42 59 65Ziyi et al[9] (CVPRrsquo18) 54 68 74UMSN (ours wLtotal) 59 72 80

TABLE V Top-1 Top-3 and Top-5 face recognition accuracieson the CelebA dataset

d) Qualitative Results The qualitative performance ofdifferent methods on several images from the Helen andCelebA test images are shown in Fig 9 We can clearly ob-serve that UMSN produces sharp images without any artifactswhen compared to the other methods [13] [9] [44]

In addition we conducted experiments on 100 real blurryimages provided by the authors of [13] [9] [44] Resultscorresponding to different methods are shown in Fig 10 Ascan be seen from this figure UMSN produces sharper andclear images when compared to the state-of-the-art methodsFor example methods [13] [44] produce results which containartifacts or blurry images Ziyi et al[9] is not able reconstructeyes nose and mouth as shown in the fourth row of Fig 10On the other hand eyes nose and mouth regions are clearlyvisible in the images corresponding to the UMSN method

C Ablation Study

We conduct experiments on two datasets to study theperformance contribution of each component in UMSN Westart with the base network (BN) and progressively addcomponents to establish their significance to estimate thefinal deblurred image The corresponding results are shownin Table VI It is important to note that BN and UMSNare configured to have the same number of learnable pa-rameters by setting appropriate number of output channelsin the intermediate convolutional layers of the ResBlock

7

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] Ours Ground TruthImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final Image

Fig 9 Sample results from the Helen and CelebA datasets As can be seen from this figure MAP-based methods [44] [13]are adding some artifacts on the final deblurred images especially near the eyes nose and mouth regions Global prior-basedmethod [9] produces very smooth outputs and some parts of the deblurred images are deformed This can be clearly seenby looking at the reconstructions in the fourth column corresponding to [9] where the right eye is added in the face imagethe fingers are removed the mustache is removed and the mouth is deformed On the other hand by using class-specificmulti-stream networks and Ltotal our method is able to reconstruct all the classes of a face perfectly and produces sharperface images For example UMSN is able to reconstruct eyes nose mustache and mouth clearly Additionally UMSN doesnot add parts like eyes when they are occluded (last row) Note that the method proposed in [9] uses a generative adversarialnetwork (GAN) to reconstruct sharper images In contrast our method is able to reconstruct sharper images even without usinga GAN

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 3: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

3

Fig 2 An overview of the proposed UMSN network First stage semantic networks consist of F-Net-i Second stage isconstructed using the base network (BN) where outputs of the F-Net-irsquos are concatenated with the output of the first ResBlocklayer in BN

(ResBlock) as our building module for the segmentationnetwork A ResBlock consist of a 1 times 1 convolution layera 3 times 3 convolution layer and two 3 times 3 convolution layerswith dilation factor of 2 as shown in Fig 4

B Base Network (BN)

We construct our base network using a combination ofUNet [31] and DenseNet [32] architectures with the ResBlockas our basic building block To increase the receptive field sizewe introduce smoothed dilation convolutions in the ResBlockas shown in Fig 4 BN is a sequence of eight ResBlockssimilar to the first stage semantic network as shown in Fig 5Note that all convolutional layers are densely connected [32]We follow residual-based learning in estimating the deblurredimage for our base network as shown in Fig 5

Fig 3 An overview of the segmentation networkConv l times l (p q) contains instance normalization [33]Rectified Linear Unit (ReLU) Conv (l times l) - convolutionallayer with kernel of size ltimes l where p and q are the numberof input and output channels respectively

Fig 4 An overview of the ResBlock Conv ltimes l(p q) containsInstance Normalization [33] ReLU - Rectified Linear UnitsConv (l times l) - convolutional layer with kernel of size l timesl where p and q are number of input and output channelsrespectively In the right side of the figure we show smootheddilation convolutions introduced in ResBlock which is similarto [34]

C UMSN Network

The UMSN network is a two-stage network The first stagenetwork is designed to obtain deblurred outputs from thesemantic class-wise blurry inputs These outputs are furtherprocessed by the second stage network to obtain the finaldeblurred image The first stage semantic network containsa sequence of five ResBlocks with residual connections asshown in Fig 5 We call the set of all convolution layers of thefirst stage network excluding the last ResBlock and Conv3times3as F-Net

The blurry image y and the semantic masks mi are fed to F-Net-i to obtain the corresponding class-specific deblurred fea-tures which are concatenated with the output of the first layer(ResBlock-Avgpool) in Base Network(BN) for constructingthe UMSN network We also use the Nested Residual Learning(NRL) in UMSN network where class specific residual featuremaps are learned and further used in estimating the residual

4

Fig 5 An overview of the first stage semantic network (F-Net)

feature maps that are added to the blurry image for obtainingthe final output For example we can observe a residualconnection between the last layer of UMSN and class-specificfeature maps obtained from Conv 1times1 using y and m Outputof this residual connection is further processed as the input tothe residual connection with the input blurry image y In thisway we define our NRL and obtain the final deblurred imageWe propose a class-based loss function to train the UMSNnetwork

D Loss for UMSN

The network parameters Θ are learned by minimizing a lossL as follows

Θ = argminΘ

L(fΘ(y m) x) = argminΘ

L(x x) (1)

where fΘ() represents the UMSN network x is the deblurredresult m is the semantic map obtained from SN We definethe reconstruction loss as L = x minus x1 A face image canbe expressed as the sum of masked images using the semanticmaps as

x =

Msumi=1

mi x

where is the element-wise multiplication and M is the totalnumber of semantic maps As the masks are independent ofone another Eq (1) can be re-written as

Θ = argminΘ

Msumi=1

L(mi xmi x) (2)

In other words the loss is calculated for every class indepen-dent and summed up in order to obtain the overall loss asfollows

L(x x) =

Msumi=1

L(mi xmi x) (3)

Uncertainty Guidance

We introduce a confidence measure for every class anduse it to re-weight the contribution of the loss from eachclass to the total loss By introducing a confidence measureand re-weighting the loss we benefit in two ways If thenetwork is giving less importance to a particular class bynot learning appropriate features of it then the ConfidenceNetwork (CN) helps UMSN to learn those class featuresby estimating low confidence values and higher gradients

for those classes through CN network Additionally by re-weighting the contribution of loss from each class it countersfor the imbalances in the error estimation from differentclasses The loss function can be written as

Lc(x x) =

Msumi=1

CiL(mi xmi x)minus λ log(Ci) (4)

where log(Ci) acts as a regularizer that prevents the valueof Ci going to zero and λ is a constant We estimate theconfidence measure Ci for each class by passing mi xmix as inputs to CN as shown in Fig 6 Ci represents howconfident UMSN is in deblurring the ith class components ofthe face image Note that Ci(isin [0 1]) confidence measure isused only in the loss function while training the weights ofUMSN and it is not used (or estimated) during inference

Fig 6 An overview of the Confidence Network (CN) xis ground truth image x is deblurred image obtained fromUMSN m semantic maps of x

Inspired by the benefits of the perceptual loss in styletransfer [35] [36] and image super-resolution [37] we use it totrain our network Let Φ() denote the features obtained usingthe VGG16 model [38] then the perceptual loss is defined asfollows

Lp = Φ(x)minus Φ(x)22 (5)

The features from layer relu1 2 of a pretrained VGG-16network [38] are used to compute the perceptual loss Thetotal loss used to train UMSN is as follows

Ltotal = Lc + λ1Lp (6)

where λ1 is a constant

IV EXPERIMENTAL RESULTS

We evaluate the results using the images provided by theauthors of [9] which consists of 8000 blurry images gener-ated using the Helen dataset [11] and 8000 blurry imagesgenerated using the CelebA dataset [12]1 Furthermore wetest our network on a test dataset called PubFig providedby the authors of [10] which contains 192 blurry imagesWe quantitatively evaluate the results using the Peak-Signal-to-Noise Ratio (PSNR) and the Structural Similarity index(SSIM) [39]

1We follow the same training and testing protocols defined by [9]

5

Methodclass-1 class-2 class-3 class-4

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

CZiyi et al 2786 088 3027 093 3333 092 3179 091

UMSN 2927 091 3075 095 3383 096 3310 094

HZiyi et al 2913 085 3120 088 3513 092 3396 088

UMSN 3083 087 3176 090 3567 093 3481 091

TABLE I PSNR and SSIM values for each deblurred semanticclass on the CelebA dataset (C) and Hellen dataset (H)

A Implementation Details

1) Segmentation Network We initially train the SN net-work using the Helen dataset [11] which contains 2000 cleanimages with corresponding semantic labels provided by Smithet al[40] SN is trained using the cross entropy loss with theAdam optimizer and the learning rate is set equal to 00002We train SN for 60000 iterations on clean images As can beseen from Fig 7 and Table II SN trained on clean images onlyis not able to produce good results on the blurry images Thuswe fine-tune SN using the blurry images and the correspondingsemantic maps generated using the Helen dataset [11] In thiscase SN is fine-tuned for 30000 iterations with the learningrate of 000001 We test our SN network on 330 clean Testimages provided by the Helen dataset as well as 8000 blurryimages Results are shown in Fig 7 and Table II in terms ofF-scores As can be seen from these results our method isable to produce reasonable segmentation results from blurryimages

Class Clean Images Blurry ImagesSN trained onclean images

SN trained onclean images

SN fine-tuned wblurry images

class-1 0910 0889 0901class-2 0903 0877 0884class-3 0856 0775 0814class-4 0625 0538 0566

TABLE II F-score values of semantic masks for clean andblurry images from the Helen dataset

2) UMSN Network Training images for UMSN are gen-erated using 2000 images from the Helen dataset [11] andrandomly selected 25000 images from the CelebA dataset [12]We generate 25000 blur kernels sizes ranging from 13 times 13to 29times 29 using 3D camera trajectories [41] Patches of size128times 128 are exacted from those images and convolved with25000 blur kernels randomly to generated about 17 millionpairs of clean-blurry data We added Gaussian noise withσ = 003 to the blurry images

a) First Stage Semantic Network Every F-Net in the firststage network is trained with clean-blurry paired data whereonly the corresponding class components of the face imagesare blurred The first stage semantic network is trained with acombination of the L1-norm and the perceptual losses usingthe Adam optimizer with learning rate of 00002 The firststage network is trained for 50 000 iterations To illustratewhat kind of outputs are observed from the F-Nets in Fig 8we display semantic masks (first row) the correspondingblurry images (second row) and the final estimated deblurred

(a) (b) (c)

Fig 7 Semantic maps generated by SN on a blurry imagefrom the Helen dataset (a) Blurry image (b) Semantic masksobtained from SN trained on clean images (c) Semantic masksobtained from SN fine tuned with the blurry images

outputs from F-Nets (last row) As it can be seen from thelast row F-Nets are able to remove most of the blur fromcorresponding class in the image

(a) (b) (c) (d)

Fig 8 First row semantic masks Second row blurry imagesThird row deblurred images using F-Net-i for classes (a) m1(b) m2 (c) m3 (d) m4

b) Overall UMSN Network UMSN is trained usingLtotal with the Adam optimizer and batchsize of 16 Thelearning rate is set equal to 00002 Note that semanticmaps mi4i=1 of the ground truth clean images in Ltotal aregenerated using SN since we do not have the actual semanticmaps for them λ and λ1 values are set equal to 001 and 10respectively UMSN is trained for 01 million iterations

B Evaluation

a) PSNR and SSIM We compare the performance of ourmethod with the following state-of-the-art algorithms MAP-based methods [14] [13] [42] [43] [44] face deblurringexemplar-based method [29] and CNN-based methods [7]

6

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMKrishnan et al[14] (CVPRrsquo11) 1930 0670 1838 0672Pan et al[29] (ECCV 2014) 2093 0727 1859 0677Shan et al[42] (SIGGRAPHrsquo08) 1957 0670 1843 0644Xu et al[13] (CVPRrsquo13) 2011 0711 1893 0685Cho et al[43] (SIGGRAPHrsquo09) 1682 0574 1303 0445Zhong et al[44] (CVPRrsquo13) 1641 0614 1726 0695Nah et al[7] (CVPRrsquo17) 2412 0823 2243 0832Ziyi et al[9] (CVPRrsquo18) wGAN 2558 0861 2434 0860Ziyi et al[9] (CVPRrsquo18) 2599 0871 2505 0879UMSN (ours wLtotal) 2693 0897 2590 0906

TABLE III PSNR and SSIM comparision of UMSN againststate-of-the-art methods

[9] Results are shown in Table III As can be seen from thistable UMSN outperforms state-of-the-art methods includingthe methods that make use of semantic maps for image de-blurring [9] Furthermore we evaluate UMSNrsquos performancein reconstructing individual semantic classes against [9] Asit can be seen from the Table I our methodrsquos performance inreconstructing individual classes is better than the sate-of-the-art method [9] In addition we compare the performance ofour method against CNN-based face deblurring method thatuses post processing [10] on the PubFig dataset consisting of192 images released by the authors of [10] The PSNR andSSIM values achieved by [10] are 3021 and 084 whereasour method achieved 3051 and 086 respectively

b) VGG-Face distance We compare the performanceof our method with the following state-of-the-art algorithmsusing Feature distance( ie L2minusnorm distance between outputfeature map from network) between deblurred image and theground truth image We use the outputs of pool5 layer from theVGG-Face [45] to compute the VGG-Face distance( dV GG)dV GG is a better perceptual metric to compare as it computesthe L2minusnorm distance in the feature space using VGG-Face [45] network Table IV clearly shows that our methodUMSN out-performs the existing state of the art methods usingthe perceptual measures like dV GG

Deblurring MethodHelen CelebAdV GG dV GG

Xu et al[13] (CVPRrsquo13) 1015 1138Zhong et al[44] (CVPRrsquo13) 1228 1472Nah et al[7] (CVPRrsquo17) 746 892Ziyi et al[9] (CVPRrsquo18) 587 649UMSN (ours wLtotal) 523 557

TABLE IV comparision of UMSN against state-of-the-artmethods using distance of feature from VGG-Face( dV GG)lower is better

c) Face recognition In order to show the significance ofdifferent face deblurring methods we perform face recognitionon the deblurred images We use the CelebA dataset to performthis comparison where we use 100 different identities asthe probe set and for each identity we select 9 additional

clean face images as the gallery set 8000 blurry imagesare generated using the 100 images in the probe set Givena blurry image the deblurred image is computed using thecorresponding deblurring algorithm The most similar face forthis deblurred image is selected from the gallery set to checkwhether they belong to same identity or not

To perform face recognition we use OpenFace toolbox [46]to compute the similarity distance between the deblurredimage and all the gallery images We select the Top-K nearestmatches for each deblurred image to compute the accuracyof the deblurring method Table V clearly shows that thedeblurred images by our method have better recognition accu-racies than the other state-of-the-art methods This experimentclearly shows that our method is able to retain the importantparts of a face while performing deblurring This in turn helpsin achieving better face recognition compared to the othermethods

Deblurring Method Top-1 Top-3 Top-5Clear images 71 84 89Blurred images 31 46 53Pan et al[29] (ECCV 2014) 44 57 64Xu et al[13] (CVPRrsquo13) 43 57 64Zhong et al[44] (CVPRrsquo13) 30 44 51Nah et al[7] (CVPRrsquo17) 42 59 65Ziyi et al[9] (CVPRrsquo18) 54 68 74UMSN (ours wLtotal) 59 72 80

TABLE V Top-1 Top-3 and Top-5 face recognition accuracieson the CelebA dataset

d) Qualitative Results The qualitative performance ofdifferent methods on several images from the Helen andCelebA test images are shown in Fig 9 We can clearly ob-serve that UMSN produces sharp images without any artifactswhen compared to the other methods [13] [9] [44]

In addition we conducted experiments on 100 real blurryimages provided by the authors of [13] [9] [44] Resultscorresponding to different methods are shown in Fig 10 Ascan be seen from this figure UMSN produces sharper andclear images when compared to the state-of-the-art methodsFor example methods [13] [44] produce results which containartifacts or blurry images Ziyi et al[9] is not able reconstructeyes nose and mouth as shown in the fourth row of Fig 10On the other hand eyes nose and mouth regions are clearlyvisible in the images corresponding to the UMSN method

C Ablation Study

We conduct experiments on two datasets to study theperformance contribution of each component in UMSN Westart with the base network (BN) and progressively addcomponents to establish their significance to estimate thefinal deblurred image The corresponding results are shownin Table VI It is important to note that BN and UMSNare configured to have the same number of learnable pa-rameters by setting appropriate number of output channelsin the intermediate convolutional layers of the ResBlock

7

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] Ours Ground TruthImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final Image

Fig 9 Sample results from the Helen and CelebA datasets As can be seen from this figure MAP-based methods [44] [13]are adding some artifacts on the final deblurred images especially near the eyes nose and mouth regions Global prior-basedmethod [9] produces very smooth outputs and some parts of the deblurred images are deformed This can be clearly seenby looking at the reconstructions in the fourth column corresponding to [9] where the right eye is added in the face imagethe fingers are removed the mustache is removed and the mouth is deformed On the other hand by using class-specificmulti-stream networks and Ltotal our method is able to reconstruct all the classes of a face perfectly and produces sharperface images For example UMSN is able to reconstruct eyes nose mustache and mouth clearly Additionally UMSN doesnot add parts like eyes when they are occluded (last row) Note that the method proposed in [9] uses a generative adversarialnetwork (GAN) to reconstruct sharper images In contrast our method is able to reconstruct sharper images even without usinga GAN

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 4: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

4

Fig 5 An overview of the first stage semantic network (F-Net)

feature maps that are added to the blurry image for obtainingthe final output For example we can observe a residualconnection between the last layer of UMSN and class-specificfeature maps obtained from Conv 1times1 using y and m Outputof this residual connection is further processed as the input tothe residual connection with the input blurry image y In thisway we define our NRL and obtain the final deblurred imageWe propose a class-based loss function to train the UMSNnetwork

D Loss for UMSN

The network parameters Θ are learned by minimizing a lossL as follows

Θ = argminΘ

L(fΘ(y m) x) = argminΘ

L(x x) (1)

where fΘ() represents the UMSN network x is the deblurredresult m is the semantic map obtained from SN We definethe reconstruction loss as L = x minus x1 A face image canbe expressed as the sum of masked images using the semanticmaps as

x =

Msumi=1

mi x

where is the element-wise multiplication and M is the totalnumber of semantic maps As the masks are independent ofone another Eq (1) can be re-written as

Θ = argminΘ

Msumi=1

L(mi xmi x) (2)

In other words the loss is calculated for every class indepen-dent and summed up in order to obtain the overall loss asfollows

L(x x) =

Msumi=1

L(mi xmi x) (3)

Uncertainty Guidance

We introduce a confidence measure for every class anduse it to re-weight the contribution of the loss from eachclass to the total loss By introducing a confidence measureand re-weighting the loss we benefit in two ways If thenetwork is giving less importance to a particular class bynot learning appropriate features of it then the ConfidenceNetwork (CN) helps UMSN to learn those class featuresby estimating low confidence values and higher gradients

for those classes through CN network Additionally by re-weighting the contribution of loss from each class it countersfor the imbalances in the error estimation from differentclasses The loss function can be written as

Lc(x x) =

Msumi=1

CiL(mi xmi x)minus λ log(Ci) (4)

where log(Ci) acts as a regularizer that prevents the valueof Ci going to zero and λ is a constant We estimate theconfidence measure Ci for each class by passing mi xmix as inputs to CN as shown in Fig 6 Ci represents howconfident UMSN is in deblurring the ith class components ofthe face image Note that Ci(isin [0 1]) confidence measure isused only in the loss function while training the weights ofUMSN and it is not used (or estimated) during inference

Fig 6 An overview of the Confidence Network (CN) xis ground truth image x is deblurred image obtained fromUMSN m semantic maps of x

Inspired by the benefits of the perceptual loss in styletransfer [35] [36] and image super-resolution [37] we use it totrain our network Let Φ() denote the features obtained usingthe VGG16 model [38] then the perceptual loss is defined asfollows

Lp = Φ(x)minus Φ(x)22 (5)

The features from layer relu1 2 of a pretrained VGG-16network [38] are used to compute the perceptual loss Thetotal loss used to train UMSN is as follows

Ltotal = Lc + λ1Lp (6)

where λ1 is a constant

IV EXPERIMENTAL RESULTS

We evaluate the results using the images provided by theauthors of [9] which consists of 8000 blurry images gener-ated using the Helen dataset [11] and 8000 blurry imagesgenerated using the CelebA dataset [12]1 Furthermore wetest our network on a test dataset called PubFig providedby the authors of [10] which contains 192 blurry imagesWe quantitatively evaluate the results using the Peak-Signal-to-Noise Ratio (PSNR) and the Structural Similarity index(SSIM) [39]

1We follow the same training and testing protocols defined by [9]

5

Methodclass-1 class-2 class-3 class-4

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

CZiyi et al 2786 088 3027 093 3333 092 3179 091

UMSN 2927 091 3075 095 3383 096 3310 094

HZiyi et al 2913 085 3120 088 3513 092 3396 088

UMSN 3083 087 3176 090 3567 093 3481 091

TABLE I PSNR and SSIM values for each deblurred semanticclass on the CelebA dataset (C) and Hellen dataset (H)

A Implementation Details

1) Segmentation Network We initially train the SN net-work using the Helen dataset [11] which contains 2000 cleanimages with corresponding semantic labels provided by Smithet al[40] SN is trained using the cross entropy loss with theAdam optimizer and the learning rate is set equal to 00002We train SN for 60000 iterations on clean images As can beseen from Fig 7 and Table II SN trained on clean images onlyis not able to produce good results on the blurry images Thuswe fine-tune SN using the blurry images and the correspondingsemantic maps generated using the Helen dataset [11] In thiscase SN is fine-tuned for 30000 iterations with the learningrate of 000001 We test our SN network on 330 clean Testimages provided by the Helen dataset as well as 8000 blurryimages Results are shown in Fig 7 and Table II in terms ofF-scores As can be seen from these results our method isable to produce reasonable segmentation results from blurryimages

Class Clean Images Blurry ImagesSN trained onclean images

SN trained onclean images

SN fine-tuned wblurry images

class-1 0910 0889 0901class-2 0903 0877 0884class-3 0856 0775 0814class-4 0625 0538 0566

TABLE II F-score values of semantic masks for clean andblurry images from the Helen dataset

2) UMSN Network Training images for UMSN are gen-erated using 2000 images from the Helen dataset [11] andrandomly selected 25000 images from the CelebA dataset [12]We generate 25000 blur kernels sizes ranging from 13 times 13to 29times 29 using 3D camera trajectories [41] Patches of size128times 128 are exacted from those images and convolved with25000 blur kernels randomly to generated about 17 millionpairs of clean-blurry data We added Gaussian noise withσ = 003 to the blurry images

a) First Stage Semantic Network Every F-Net in the firststage network is trained with clean-blurry paired data whereonly the corresponding class components of the face imagesare blurred The first stage semantic network is trained with acombination of the L1-norm and the perceptual losses usingthe Adam optimizer with learning rate of 00002 The firststage network is trained for 50 000 iterations To illustratewhat kind of outputs are observed from the F-Nets in Fig 8we display semantic masks (first row) the correspondingblurry images (second row) and the final estimated deblurred

(a) (b) (c)

Fig 7 Semantic maps generated by SN on a blurry imagefrom the Helen dataset (a) Blurry image (b) Semantic masksobtained from SN trained on clean images (c) Semantic masksobtained from SN fine tuned with the blurry images

outputs from F-Nets (last row) As it can be seen from thelast row F-Nets are able to remove most of the blur fromcorresponding class in the image

(a) (b) (c) (d)

Fig 8 First row semantic masks Second row blurry imagesThird row deblurred images using F-Net-i for classes (a) m1(b) m2 (c) m3 (d) m4

b) Overall UMSN Network UMSN is trained usingLtotal with the Adam optimizer and batchsize of 16 Thelearning rate is set equal to 00002 Note that semanticmaps mi4i=1 of the ground truth clean images in Ltotal aregenerated using SN since we do not have the actual semanticmaps for them λ and λ1 values are set equal to 001 and 10respectively UMSN is trained for 01 million iterations

B Evaluation

a) PSNR and SSIM We compare the performance of ourmethod with the following state-of-the-art algorithms MAP-based methods [14] [13] [42] [43] [44] face deblurringexemplar-based method [29] and CNN-based methods [7]

6

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMKrishnan et al[14] (CVPRrsquo11) 1930 0670 1838 0672Pan et al[29] (ECCV 2014) 2093 0727 1859 0677Shan et al[42] (SIGGRAPHrsquo08) 1957 0670 1843 0644Xu et al[13] (CVPRrsquo13) 2011 0711 1893 0685Cho et al[43] (SIGGRAPHrsquo09) 1682 0574 1303 0445Zhong et al[44] (CVPRrsquo13) 1641 0614 1726 0695Nah et al[7] (CVPRrsquo17) 2412 0823 2243 0832Ziyi et al[9] (CVPRrsquo18) wGAN 2558 0861 2434 0860Ziyi et al[9] (CVPRrsquo18) 2599 0871 2505 0879UMSN (ours wLtotal) 2693 0897 2590 0906

TABLE III PSNR and SSIM comparision of UMSN againststate-of-the-art methods

[9] Results are shown in Table III As can be seen from thistable UMSN outperforms state-of-the-art methods includingthe methods that make use of semantic maps for image de-blurring [9] Furthermore we evaluate UMSNrsquos performancein reconstructing individual semantic classes against [9] Asit can be seen from the Table I our methodrsquos performance inreconstructing individual classes is better than the sate-of-the-art method [9] In addition we compare the performance ofour method against CNN-based face deblurring method thatuses post processing [10] on the PubFig dataset consisting of192 images released by the authors of [10] The PSNR andSSIM values achieved by [10] are 3021 and 084 whereasour method achieved 3051 and 086 respectively

b) VGG-Face distance We compare the performanceof our method with the following state-of-the-art algorithmsusing Feature distance( ie L2minusnorm distance between outputfeature map from network) between deblurred image and theground truth image We use the outputs of pool5 layer from theVGG-Face [45] to compute the VGG-Face distance( dV GG)dV GG is a better perceptual metric to compare as it computesthe L2minusnorm distance in the feature space using VGG-Face [45] network Table IV clearly shows that our methodUMSN out-performs the existing state of the art methods usingthe perceptual measures like dV GG

Deblurring MethodHelen CelebAdV GG dV GG

Xu et al[13] (CVPRrsquo13) 1015 1138Zhong et al[44] (CVPRrsquo13) 1228 1472Nah et al[7] (CVPRrsquo17) 746 892Ziyi et al[9] (CVPRrsquo18) 587 649UMSN (ours wLtotal) 523 557

TABLE IV comparision of UMSN against state-of-the-artmethods using distance of feature from VGG-Face( dV GG)lower is better

c) Face recognition In order to show the significance ofdifferent face deblurring methods we perform face recognitionon the deblurred images We use the CelebA dataset to performthis comparison where we use 100 different identities asthe probe set and for each identity we select 9 additional

clean face images as the gallery set 8000 blurry imagesare generated using the 100 images in the probe set Givena blurry image the deblurred image is computed using thecorresponding deblurring algorithm The most similar face forthis deblurred image is selected from the gallery set to checkwhether they belong to same identity or not

To perform face recognition we use OpenFace toolbox [46]to compute the similarity distance between the deblurredimage and all the gallery images We select the Top-K nearestmatches for each deblurred image to compute the accuracyof the deblurring method Table V clearly shows that thedeblurred images by our method have better recognition accu-racies than the other state-of-the-art methods This experimentclearly shows that our method is able to retain the importantparts of a face while performing deblurring This in turn helpsin achieving better face recognition compared to the othermethods

Deblurring Method Top-1 Top-3 Top-5Clear images 71 84 89Blurred images 31 46 53Pan et al[29] (ECCV 2014) 44 57 64Xu et al[13] (CVPRrsquo13) 43 57 64Zhong et al[44] (CVPRrsquo13) 30 44 51Nah et al[7] (CVPRrsquo17) 42 59 65Ziyi et al[9] (CVPRrsquo18) 54 68 74UMSN (ours wLtotal) 59 72 80

TABLE V Top-1 Top-3 and Top-5 face recognition accuracieson the CelebA dataset

d) Qualitative Results The qualitative performance ofdifferent methods on several images from the Helen andCelebA test images are shown in Fig 9 We can clearly ob-serve that UMSN produces sharp images without any artifactswhen compared to the other methods [13] [9] [44]

In addition we conducted experiments on 100 real blurryimages provided by the authors of [13] [9] [44] Resultscorresponding to different methods are shown in Fig 10 Ascan be seen from this figure UMSN produces sharper andclear images when compared to the state-of-the-art methodsFor example methods [13] [44] produce results which containartifacts or blurry images Ziyi et al[9] is not able reconstructeyes nose and mouth as shown in the fourth row of Fig 10On the other hand eyes nose and mouth regions are clearlyvisible in the images corresponding to the UMSN method

C Ablation Study

We conduct experiments on two datasets to study theperformance contribution of each component in UMSN Westart with the base network (BN) and progressively addcomponents to establish their significance to estimate thefinal deblurred image The corresponding results are shownin Table VI It is important to note that BN and UMSNare configured to have the same number of learnable pa-rameters by setting appropriate number of output channelsin the intermediate convolutional layers of the ResBlock

7

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] Ours Ground TruthImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final Image

Fig 9 Sample results from the Helen and CelebA datasets As can be seen from this figure MAP-based methods [44] [13]are adding some artifacts on the final deblurred images especially near the eyes nose and mouth regions Global prior-basedmethod [9] produces very smooth outputs and some parts of the deblurred images are deformed This can be clearly seenby looking at the reconstructions in the fourth column corresponding to [9] where the right eye is added in the face imagethe fingers are removed the mustache is removed and the mouth is deformed On the other hand by using class-specificmulti-stream networks and Ltotal our method is able to reconstruct all the classes of a face perfectly and produces sharperface images For example UMSN is able to reconstruct eyes nose mustache and mouth clearly Additionally UMSN doesnot add parts like eyes when they are occluded (last row) Note that the method proposed in [9] uses a generative adversarialnetwork (GAN) to reconstruct sharper images In contrast our method is able to reconstruct sharper images even without usinga GAN

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 5: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

5

Methodclass-1 class-2 class-3 class-4

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

CZiyi et al 2786 088 3027 093 3333 092 3179 091

UMSN 2927 091 3075 095 3383 096 3310 094

HZiyi et al 2913 085 3120 088 3513 092 3396 088

UMSN 3083 087 3176 090 3567 093 3481 091

TABLE I PSNR and SSIM values for each deblurred semanticclass on the CelebA dataset (C) and Hellen dataset (H)

A Implementation Details

1) Segmentation Network We initially train the SN net-work using the Helen dataset [11] which contains 2000 cleanimages with corresponding semantic labels provided by Smithet al[40] SN is trained using the cross entropy loss with theAdam optimizer and the learning rate is set equal to 00002We train SN for 60000 iterations on clean images As can beseen from Fig 7 and Table II SN trained on clean images onlyis not able to produce good results on the blurry images Thuswe fine-tune SN using the blurry images and the correspondingsemantic maps generated using the Helen dataset [11] In thiscase SN is fine-tuned for 30000 iterations with the learningrate of 000001 We test our SN network on 330 clean Testimages provided by the Helen dataset as well as 8000 blurryimages Results are shown in Fig 7 and Table II in terms ofF-scores As can be seen from these results our method isable to produce reasonable segmentation results from blurryimages

Class Clean Images Blurry ImagesSN trained onclean images

SN trained onclean images

SN fine-tuned wblurry images

class-1 0910 0889 0901class-2 0903 0877 0884class-3 0856 0775 0814class-4 0625 0538 0566

TABLE II F-score values of semantic masks for clean andblurry images from the Helen dataset

2) UMSN Network Training images for UMSN are gen-erated using 2000 images from the Helen dataset [11] andrandomly selected 25000 images from the CelebA dataset [12]We generate 25000 blur kernels sizes ranging from 13 times 13to 29times 29 using 3D camera trajectories [41] Patches of size128times 128 are exacted from those images and convolved with25000 blur kernels randomly to generated about 17 millionpairs of clean-blurry data We added Gaussian noise withσ = 003 to the blurry images

a) First Stage Semantic Network Every F-Net in the firststage network is trained with clean-blurry paired data whereonly the corresponding class components of the face imagesare blurred The first stage semantic network is trained with acombination of the L1-norm and the perceptual losses usingthe Adam optimizer with learning rate of 00002 The firststage network is trained for 50 000 iterations To illustratewhat kind of outputs are observed from the F-Nets in Fig 8we display semantic masks (first row) the correspondingblurry images (second row) and the final estimated deblurred

(a) (b) (c)

Fig 7 Semantic maps generated by SN on a blurry imagefrom the Helen dataset (a) Blurry image (b) Semantic masksobtained from SN trained on clean images (c) Semantic masksobtained from SN fine tuned with the blurry images

outputs from F-Nets (last row) As it can be seen from thelast row F-Nets are able to remove most of the blur fromcorresponding class in the image

(a) (b) (c) (d)

Fig 8 First row semantic masks Second row blurry imagesThird row deblurred images using F-Net-i for classes (a) m1(b) m2 (c) m3 (d) m4

b) Overall UMSN Network UMSN is trained usingLtotal with the Adam optimizer and batchsize of 16 Thelearning rate is set equal to 00002 Note that semanticmaps mi4i=1 of the ground truth clean images in Ltotal aregenerated using SN since we do not have the actual semanticmaps for them λ and λ1 values are set equal to 001 and 10respectively UMSN is trained for 01 million iterations

B Evaluation

a) PSNR and SSIM We compare the performance of ourmethod with the following state-of-the-art algorithms MAP-based methods [14] [13] [42] [43] [44] face deblurringexemplar-based method [29] and CNN-based methods [7]

6

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMKrishnan et al[14] (CVPRrsquo11) 1930 0670 1838 0672Pan et al[29] (ECCV 2014) 2093 0727 1859 0677Shan et al[42] (SIGGRAPHrsquo08) 1957 0670 1843 0644Xu et al[13] (CVPRrsquo13) 2011 0711 1893 0685Cho et al[43] (SIGGRAPHrsquo09) 1682 0574 1303 0445Zhong et al[44] (CVPRrsquo13) 1641 0614 1726 0695Nah et al[7] (CVPRrsquo17) 2412 0823 2243 0832Ziyi et al[9] (CVPRrsquo18) wGAN 2558 0861 2434 0860Ziyi et al[9] (CVPRrsquo18) 2599 0871 2505 0879UMSN (ours wLtotal) 2693 0897 2590 0906

TABLE III PSNR and SSIM comparision of UMSN againststate-of-the-art methods

[9] Results are shown in Table III As can be seen from thistable UMSN outperforms state-of-the-art methods includingthe methods that make use of semantic maps for image de-blurring [9] Furthermore we evaluate UMSNrsquos performancein reconstructing individual semantic classes against [9] Asit can be seen from the Table I our methodrsquos performance inreconstructing individual classes is better than the sate-of-the-art method [9] In addition we compare the performance ofour method against CNN-based face deblurring method thatuses post processing [10] on the PubFig dataset consisting of192 images released by the authors of [10] The PSNR andSSIM values achieved by [10] are 3021 and 084 whereasour method achieved 3051 and 086 respectively

b) VGG-Face distance We compare the performanceof our method with the following state-of-the-art algorithmsusing Feature distance( ie L2minusnorm distance between outputfeature map from network) between deblurred image and theground truth image We use the outputs of pool5 layer from theVGG-Face [45] to compute the VGG-Face distance( dV GG)dV GG is a better perceptual metric to compare as it computesthe L2minusnorm distance in the feature space using VGG-Face [45] network Table IV clearly shows that our methodUMSN out-performs the existing state of the art methods usingthe perceptual measures like dV GG

Deblurring MethodHelen CelebAdV GG dV GG

Xu et al[13] (CVPRrsquo13) 1015 1138Zhong et al[44] (CVPRrsquo13) 1228 1472Nah et al[7] (CVPRrsquo17) 746 892Ziyi et al[9] (CVPRrsquo18) 587 649UMSN (ours wLtotal) 523 557

TABLE IV comparision of UMSN against state-of-the-artmethods using distance of feature from VGG-Face( dV GG)lower is better

c) Face recognition In order to show the significance ofdifferent face deblurring methods we perform face recognitionon the deblurred images We use the CelebA dataset to performthis comparison where we use 100 different identities asthe probe set and for each identity we select 9 additional

clean face images as the gallery set 8000 blurry imagesare generated using the 100 images in the probe set Givena blurry image the deblurred image is computed using thecorresponding deblurring algorithm The most similar face forthis deblurred image is selected from the gallery set to checkwhether they belong to same identity or not

To perform face recognition we use OpenFace toolbox [46]to compute the similarity distance between the deblurredimage and all the gallery images We select the Top-K nearestmatches for each deblurred image to compute the accuracyof the deblurring method Table V clearly shows that thedeblurred images by our method have better recognition accu-racies than the other state-of-the-art methods This experimentclearly shows that our method is able to retain the importantparts of a face while performing deblurring This in turn helpsin achieving better face recognition compared to the othermethods

Deblurring Method Top-1 Top-3 Top-5Clear images 71 84 89Blurred images 31 46 53Pan et al[29] (ECCV 2014) 44 57 64Xu et al[13] (CVPRrsquo13) 43 57 64Zhong et al[44] (CVPRrsquo13) 30 44 51Nah et al[7] (CVPRrsquo17) 42 59 65Ziyi et al[9] (CVPRrsquo18) 54 68 74UMSN (ours wLtotal) 59 72 80

TABLE V Top-1 Top-3 and Top-5 face recognition accuracieson the CelebA dataset

d) Qualitative Results The qualitative performance ofdifferent methods on several images from the Helen andCelebA test images are shown in Fig 9 We can clearly ob-serve that UMSN produces sharp images without any artifactswhen compared to the other methods [13] [9] [44]

In addition we conducted experiments on 100 real blurryimages provided by the authors of [13] [9] [44] Resultscorresponding to different methods are shown in Fig 10 Ascan be seen from this figure UMSN produces sharper andclear images when compared to the state-of-the-art methodsFor example methods [13] [44] produce results which containartifacts or blurry images Ziyi et al[9] is not able reconstructeyes nose and mouth as shown in the fourth row of Fig 10On the other hand eyes nose and mouth regions are clearlyvisible in the images corresponding to the UMSN method

C Ablation Study

We conduct experiments on two datasets to study theperformance contribution of each component in UMSN Westart with the base network (BN) and progressively addcomponents to establish their significance to estimate thefinal deblurred image The corresponding results are shownin Table VI It is important to note that BN and UMSNare configured to have the same number of learnable pa-rameters by setting appropriate number of output channelsin the intermediate convolutional layers of the ResBlock

7

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] Ours Ground TruthImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final Image

Fig 9 Sample results from the Helen and CelebA datasets As can be seen from this figure MAP-based methods [44] [13]are adding some artifacts on the final deblurred images especially near the eyes nose and mouth regions Global prior-basedmethod [9] produces very smooth outputs and some parts of the deblurred images are deformed This can be clearly seenby looking at the reconstructions in the fourth column corresponding to [9] where the right eye is added in the face imagethe fingers are removed the mustache is removed and the mouth is deformed On the other hand by using class-specificmulti-stream networks and Ltotal our method is able to reconstruct all the classes of a face perfectly and produces sharperface images For example UMSN is able to reconstruct eyes nose mustache and mouth clearly Additionally UMSN doesnot add parts like eyes when they are occluded (last row) Note that the method proposed in [9] uses a generative adversarialnetwork (GAN) to reconstruct sharper images In contrast our method is able to reconstruct sharper images even without usinga GAN

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 6: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

6

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMKrishnan et al[14] (CVPRrsquo11) 1930 0670 1838 0672Pan et al[29] (ECCV 2014) 2093 0727 1859 0677Shan et al[42] (SIGGRAPHrsquo08) 1957 0670 1843 0644Xu et al[13] (CVPRrsquo13) 2011 0711 1893 0685Cho et al[43] (SIGGRAPHrsquo09) 1682 0574 1303 0445Zhong et al[44] (CVPRrsquo13) 1641 0614 1726 0695Nah et al[7] (CVPRrsquo17) 2412 0823 2243 0832Ziyi et al[9] (CVPRrsquo18) wGAN 2558 0861 2434 0860Ziyi et al[9] (CVPRrsquo18) 2599 0871 2505 0879UMSN (ours wLtotal) 2693 0897 2590 0906

TABLE III PSNR and SSIM comparision of UMSN againststate-of-the-art methods

[9] Results are shown in Table III As can be seen from thistable UMSN outperforms state-of-the-art methods includingthe methods that make use of semantic maps for image de-blurring [9] Furthermore we evaluate UMSNrsquos performancein reconstructing individual semantic classes against [9] Asit can be seen from the Table I our methodrsquos performance inreconstructing individual classes is better than the sate-of-the-art method [9] In addition we compare the performance ofour method against CNN-based face deblurring method thatuses post processing [10] on the PubFig dataset consisting of192 images released by the authors of [10] The PSNR andSSIM values achieved by [10] are 3021 and 084 whereasour method achieved 3051 and 086 respectively

b) VGG-Face distance We compare the performanceof our method with the following state-of-the-art algorithmsusing Feature distance( ie L2minusnorm distance between outputfeature map from network) between deblurred image and theground truth image We use the outputs of pool5 layer from theVGG-Face [45] to compute the VGG-Face distance( dV GG)dV GG is a better perceptual metric to compare as it computesthe L2minusnorm distance in the feature space using VGG-Face [45] network Table IV clearly shows that our methodUMSN out-performs the existing state of the art methods usingthe perceptual measures like dV GG

Deblurring MethodHelen CelebAdV GG dV GG

Xu et al[13] (CVPRrsquo13) 1015 1138Zhong et al[44] (CVPRrsquo13) 1228 1472Nah et al[7] (CVPRrsquo17) 746 892Ziyi et al[9] (CVPRrsquo18) 587 649UMSN (ours wLtotal) 523 557

TABLE IV comparision of UMSN against state-of-the-artmethods using distance of feature from VGG-Face( dV GG)lower is better

c) Face recognition In order to show the significance ofdifferent face deblurring methods we perform face recognitionon the deblurred images We use the CelebA dataset to performthis comparison where we use 100 different identities asthe probe set and for each identity we select 9 additional

clean face images as the gallery set 8000 blurry imagesare generated using the 100 images in the probe set Givena blurry image the deblurred image is computed using thecorresponding deblurring algorithm The most similar face forthis deblurred image is selected from the gallery set to checkwhether they belong to same identity or not

To perform face recognition we use OpenFace toolbox [46]to compute the similarity distance between the deblurredimage and all the gallery images We select the Top-K nearestmatches for each deblurred image to compute the accuracyof the deblurring method Table V clearly shows that thedeblurred images by our method have better recognition accu-racies than the other state-of-the-art methods This experimentclearly shows that our method is able to retain the importantparts of a face while performing deblurring This in turn helpsin achieving better face recognition compared to the othermethods

Deblurring Method Top-1 Top-3 Top-5Clear images 71 84 89Blurred images 31 46 53Pan et al[29] (ECCV 2014) 44 57 64Xu et al[13] (CVPRrsquo13) 43 57 64Zhong et al[44] (CVPRrsquo13) 30 44 51Nah et al[7] (CVPRrsquo17) 42 59 65Ziyi et al[9] (CVPRrsquo18) 54 68 74UMSN (ours wLtotal) 59 72 80

TABLE V Top-1 Top-3 and Top-5 face recognition accuracieson the CelebA dataset

d) Qualitative Results The qualitative performance ofdifferent methods on several images from the Helen andCelebA test images are shown in Fig 9 We can clearly ob-serve that UMSN produces sharp images without any artifactswhen compared to the other methods [13] [9] [44]

In addition we conducted experiments on 100 real blurryimages provided by the authors of [13] [9] [44] Resultscorresponding to different methods are shown in Fig 10 Ascan be seen from this figure UMSN produces sharper andclear images when compared to the state-of-the-art methodsFor example methods [13] [44] produce results which containartifacts or blurry images Ziyi et al[9] is not able reconstructeyes nose and mouth as shown in the fourth row of Fig 10On the other hand eyes nose and mouth regions are clearlyvisible in the images corresponding to the UMSN method

C Ablation Study

We conduct experiments on two datasets to study theperformance contribution of each component in UMSN Westart with the base network (BN) and progressively addcomponents to establish their significance to estimate thefinal deblurred image The corresponding results are shownin Table VI It is important to note that BN and UMSNare configured to have the same number of learnable pa-rameters by setting appropriate number of output channelsin the intermediate convolutional layers of the ResBlock

7

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] Ours Ground TruthImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final Image

Fig 9 Sample results from the Helen and CelebA datasets As can be seen from this figure MAP-based methods [44] [13]are adding some artifacts on the final deblurred images especially near the eyes nose and mouth regions Global prior-basedmethod [9] produces very smooth outputs and some parts of the deblurred images are deformed This can be clearly seenby looking at the reconstructions in the fourth column corresponding to [9] where the right eye is added in the face imagethe fingers are removed the mustache is removed and the mouth is deformed On the other hand by using class-specificmulti-stream networks and Ltotal our method is able to reconstruct all the classes of a face perfectly and produces sharperface images For example UMSN is able to reconstruct eyes nose mustache and mouth clearly Additionally UMSN doesnot add parts like eyes when they are occluded (last row) Note that the method proposed in [9] uses a generative adversarialnetwork (GAN) to reconstruct sharper images In contrast our method is able to reconstruct sharper images even without usinga GAN

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 7: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

7

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] Ours Ground TruthImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final Image

Fig 9 Sample results from the Helen and CelebA datasets As can be seen from this figure MAP-based methods [44] [13]are adding some artifacts on the final deblurred images especially near the eyes nose and mouth regions Global prior-basedmethod [9] produces very smooth outputs and some parts of the deblurred images are deformed This can be clearly seenby looking at the reconstructions in the fourth column corresponding to [9] where the right eye is added in the face imagethe fingers are removed the mustache is removed and the mouth is deformed On the other hand by using class-specificmulti-stream networks and Ltotal our method is able to reconstruct all the classes of a face perfectly and produces sharperface images For example UMSN is able to reconstruct eyes nose mustache and mouth clearly Additionally UMSN doesnot add parts like eyes when they are occluded (last row) Note that the method proposed in [9] uses a generative adversarialnetwork (GAN) to reconstruct sharper images In contrast our method is able to reconstruct sharper images even without usinga GAN

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 8: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

8

Blurry Xu et al[13] Zhong et al[44] Ziyi et al[9] OursImage (CVPRrsquo13) (CVPRrsquo13) (CVPRrsquo18) Final

Fig 10 Sample results on real blurry images

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 9: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

9

(a) (b) (c) (d) (e) (f) (g)

Fig 11 Ablation study (a) Blurry image (b) BN (c) BN + semantic maps (d) BN + semantic maps + NRL (e) UMSNwithout Lc (f) UMSN (g) ground Truth

The ablation study demonstrates that each of the componentsimprove the quality of the deblurred image In particularthe introduction of the multi-stream architecture improved theperformance by 05dB When UMSN is trained using Ltotal

the performance improved by an additional 025dB Finallythe combination of all the components of UMSN produce thebest results with 165dB gain compared to BN

Deblurring MethodHelen CelebA

PSNR SSIM PSNR SSIMbaseline (BN) 2512 082 2438 081+ semantic maps 2555 084 2472 082+ nested residuals 2608 085 2526 084UMSN without Lc 2661 087 2567 087UMSN 2693 0897 2590 090

TABLE VI PSNR and SSIM results of the ablation study

Sample reconstructions corresponding to the ablation studyusing the Helen and CelebA datasets are shown in Fig 11As can be seen from this figure BN and BN+semantic mapsproduce reconstructions that are still blurry and they fail toreconstruct the facial parts like eyes nose and mouth well Onthe other hand BN+semantic maps+ NRL is able to improvethe reconstruction quality but visually it is still not comparableto the state-of-the-art method As shown in Fig 12 NRL isable to estimate the class-specific residual feature maps whichare further used for estimating the final deblurred image FromFig 12 we can see what different residual feature maps arelearning for different classes In particular the use of F-Netshelps UMSN to produce qualitatively good results where allthe parts of a face are clearly visible Furthermore trainingUMSN using Ltotal results in sharper images

We also compare our method qualitatively with postprocessing-based method [10] on the PubFig dataset As canbe seen from the results shown in Fig 13 even after applyingsome post processing [10] tends to produce results that aresmooth In comparison UMSN is able to produce sharper andhigh-quality images without any post-processing

(a) (b) (c) (d)

Fig 12 (a) Blurry image (b) UMSN (c) Intermediate residualfeature map learned for the eyes and the hair (d) Intermediateresidual feature map learned for the face skin in NRL

(a) (b) (c) (d)

Fig 13 Sample results from the PubFig dataset (a) Blurryimage (b) Ground Truth (c) Song et al[10](IJCVrsquo19) (d)UMSN

V CONCLUSION

We proposed a new method called Uncertainty guidedMulti-stream Semantic Network (UMSN) to address thesingle image blind deblurring of face image problem thatentails the use of facial sementic information In our approachwe introduced a different way to use the ouput featuresfrom the sub-networks which are trained for the individualsemantic class-wise deblurring Additionally we introduced

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 10: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

10

novel techniques such as nested residual learning and class-based confidence loss to improve the performance of UMSNIn comparison to the state-of-the-art single image blind de-blurring methods the proposed approach is able to achievesignificant improvements when evaluated on three popular facedatasets

APPENDIX AUMSN NETWORK SEQUENCE

The BaseNetwork is similar to the first stage semanticnetwork as shown in Fig 5 with the following sequence oflayersResBlock(364)-Avgpool-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-ResBlock(6464)-Upsample-ResBlock(6416)-Conv 3times 3(163)F-Net-i is a a sequence of five ResBlocks with denseconnectionsResBlock(316)-Avgpool-ResBlock(1616)-ResBlock(1616)-ResBlock(1616)-ResBlock(168)where ResBlock(mn) indicates m input channels and noutput channels for ResBlock

F-Net-irsquos are joined to form the first stage of UMSNfurther these outputs re concatenated to output of first layer ofBase Network(BN) for constructing the UMSN network ThusUMSN is constructed by combining F-Net-irsquos and BN whereF-Net-irsquos acts as first stage of UMSN and BN acts as secondstage of UMSN

APPENDIX BCONFIDENCE SCORES

In this section we show some example test outputs ofUMSN(trained with Ltotal) at different time instances oftraining and their corresponding confidence values to showhow confidence measure is helping the network Confidencemeasure helps the UMSN in deblurring different parts of theface For example as shown in the Fig 14 and Table VIIthe output of UMSN after 50000 iterations are blurry at eyesnose and mouth which was reflected in low confidence scorefor C3 when compared to other classes Eventually UMSNlearnt to reconstruct eyes nose and mouth which was reflectedin high confidence score for C3 and as shown in the Figure14 the output deblurred images are sharp with eyes nose andmouth reconstructed perfectly

Blurry Image Iteration C1 C2 C3 C4

Image13000 0573 0576 0610 058450000 0676 0902 0789 0847100000 0858 0917 0989 0962

Image23000 0519 0528 0493 061150000 0728 0883 0817 0872100000 0896 0919 0966 0944

TABLE VII Confidence values for the Image1 and Images2first row and second row images in Fig 14 respectively

(a) (b) (c) (d)

Fig 14 Sample deblurring results of UMSN at different timeinstances of training (a) Blurry image (b) trained for 3000iterations (c) 50000 iterations (d) 01 million iterations

ACKNOWLEDGMENT

This research is based upon work supported by the Officeof the Director of National Intelligence (ODNI) IntelligenceAdvanced Research Projects Activity (IARPA) via IARPARampD Contract No 2014-14071600012 The views and conclu-sions contained herein are those of the authors and should notbe interpreted as necessarily representing the official policiesor endorsements either expressed or implied of the ODNIIARPA or the US Government The US Government is au-thorized to reproduce and distribute reprints for Governmentalpurposes notwithstanding any copyright annotation thereon

REFERENCES

[1] R Fergus B Singh A Hertzmann S T Roweis and W T FreemanldquoRemoving camera shake from a single photographrdquo ACM transactionson graphics (TOG) vol 25 no 3 pp 787ndash794 2006

[2] C J Schuler H Christopher Burger S Harmeling and B ScholkopfldquoA machine learning approach for non-blind image deconvolutionrdquo inProceedings of the IEEE Conference on Computer Vision and PatternRecognition 2013 pp 1067ndash1074

[3] W Ren X Cao J Pan X Guo W Zuo and M-H Yang ldquoImagedeblurring via enhanced low-rank priorrdquo IEEE Transactions on ImageProcessing vol 25 no 7 pp 3426ndash3437 2016

[4] L Sun S Cho J Wang and J Hays ldquoEdge-based blur kernel estimationusing patch priorsrdquo in IEEE International Conference on ComputationalPhotography (ICCP) IEEE 2013 pp 1ndash8

[5] T M Nimisha A Kumar Singh and A N Rajagopalan ldquoBlur-invariant deep learning for blind-deblurringrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4752ndash4760

[6] L Xu J S Ren C Liu and J Jia ldquoDeep convolutional neural networkfor image deconvolutionrdquo in Advances in neural information processingsystems 2014 pp 1790ndash1798

[7] S Nah T Hyun Kim and K Mu Lee ldquoDeep multi-scale convolutionalneural network for dynamic scene deblurringrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2017pp 3883ndash3891

[8] S Zhang X Shen Z Lin R Mech J P Costeira and J M MouraldquoLearning to understand image blurrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2018 pp6586ndash6595

[9] Z Shen W-S Lai T Xu J Kautz and M-H Yang ldquoDeep semanticface deblurringrdquo Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition pp 8260ndash8269 2018

[10] Y Song J Zhang L Gong S He L Bao J Pan Q Yang and M-HYang ldquoJoint face hallucination and deblurring via structure generationand detail enhancementrdquo International Journal of Computer Vision vol127 no 6-7 pp 785ndash800 2019

[11] V Le J Brandt Z Lin L Bourdev and T S Huang ldquoInteractivefacial feature localizationrdquo in European conference on computer visionSpringer 2012 pp 679ndash692

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016

Page 11: Deblurring Face Images using Uncertainty Guided …1 Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks Rajeev Yasarla, Student Member, IEEE, Federico Perazzi,

11

[12] Z Liu P Luo X Wang and X Tang ldquoDeep learning face attributesin the wildrdquo in Proceedings of the IEEE international conference oncomputer vision 2015 pp 3730ndash3738

[13] L Xu S Zheng and J Jia ldquoUnnatural l0 sparse representation fornatural image deblurringrdquo Proceedings of the IEEE conference oncomputer vision and pattern recognition pp 1107ndash1114 2013

[14] D Krishnan T Tay and R Fergus ldquoBlind deconvolution using anormalized sparsity measurerdquo CVPR 2011 pp 233ndash240 2011

[15] J Pan D Sun H Pfister and M-H Yang ldquoBlind image deblurringusing dark channel priorrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition 2016 pp 1628ndash1636

[16] J Ni P Turaga V M Patel and R Chellappa ldquoExample-drivenmanifold priors for image deconvolutionrdquo IEEE Transactions on ImageProcessing vol 20 no 11 pp 3086ndash3096 Nov 2011

[17] V M Patel G R Easley and D M Healy ldquoShearlet-based deconvo-lutionrdquo IEEE Transactions on Image Processing vol 18 no 12 pp2673ndash2685 Dec 2009

[18] J Kruse C Rother and U Schmidt ldquoLearning to push the limits ofefficient fft-based image deconvolutionrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2017 pp 4586ndash4594

[19] U Schmidt C Rother S Nowozin J Jancsary and S Roth ldquoDiscrim-inative non-blind deblurringrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition 2013 pp 604ndash611

[20] U Schmidt and S Roth ldquoShrinkage fields for effective image restora-tionrdquo in Proceedings of the IEEE Conference on Computer Vision andPattern Recognition 2014 pp 2774ndash2781

[21] S Vasu V Reddy Maligireddy and A Rajagopalan ldquoNon-blind de-blurring Handling kernel uncertainty with cnnsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2018pp 3272ndash3281

[22] C J Schuler M Hirsch S Harmeling and B Scholkopf ldquoLearning todeblurrdquo vol 38 no 7 IEEE 2015 pp 1439ndash1451

[23] A Chakrabarti ldquoDeep convolutional neural network for image decon-volutionrdquo 2014

[24] Y Tsai X Shen Z Lin K Sunkavalli and M Yang ldquoSky is not thelimit semantic-aware sky replacementrdquo ACM Trans Graph vol 35no 4 2016

[25] D Eigen D Krishnan and R Fergus ldquoRestoring an image takenthrough a window covered with dirt or rainrdquo in Proceedings of the IEEEInternational Conference on Computer Vision 2013 pp 633ndash640

[26] P Svoboda M Hradis L Marsık and P Zemcık ldquoCnn for license platemotion deblurringrdquo in 2016 IEEE International Conference on ImageProcessing (ICIP) IEEE 2016 pp 3832ndash3836

[27] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring text images via l0-regularized intensity and gradient priorrdquo in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition 2014 pp2901ndash2908

[28] S Baker and T Kanade ldquoLimits on super-resolution and how to breakthemrdquo IEEE Trans Pattern Anal Mach Intell vol 24 no 9 2002

[29] J Pan Z Hu Z Su and M-H Yang ldquoDeblurring face images withexemplarsrdquo pp 47ndash62 2014

[30] N Joshi W Matusik E H Adelson and D J Kriegman ldquoPersonalphoto enhancement using example imagesrdquo ACM Trans Graph vol 29no 2 2010

[31] O Ronneberger PFischer and T Broxrdquo ldquoU-net Convolutionalnetworks for biomedical image segmentationrdquordquo in Medical ImageComputing and Computer-Assisted Intervention (MICCAI)rdquo serrdquoLNCSrdquo vol rdquo9351rdquo rdquo2015rdquo pp rdquo234ndash241rdquo [Online] Availablerdquohttplmbinformatikuni-freiburgdePublications2015RFB15ardquo

[32] G Huang Z Liu L Van Der Maaten and K Q Weinberger ldquoDenselyconnected convolutional networksrdquo in Proceedings of the IEEE confer-ence on computer vision and pattern recognition 2017 pp 4700ndash4708

[33] D Ulyanov A Vedaldi and V Lempitsky ldquoInstance normalization Themissing ingredient for fast stylizationrdquo arXiv preprint arXiv1607080222016

[34] Z Wang and S Ji ldquoSmoothed dilated convolutions for improved densepredictionrdquo in Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery amp Data Mining ACM 2018 pp2486ndash2495

[35] J Johnson A Alahi and L Fei-Fei ldquoPerceptual losses for real-timestyle transfer and super-resolutionrdquo ECCV 2016

[36] H Zhang and K Dana ldquoMulti-style generative network for real-timetransferrdquo arXiv preprint arXiv170306953 2017

[37] C Ledig L Theis F Huszar J Caballero A Cunningham A AcostaA Aitken A Tejani J Totz Z Wang et al ldquoPhoto-realistic singleimage super-resolution using a generative adversarial networkrdquo pp4681ndash4690 2017

[38] K Simonyan and A Zisserman ldquoVery deep convolutional networks forlarge-scale image recognitionrdquo arXiv14091556 2014

[39] Z Wang A C Bovik H R Sheikh and E P Simoncelli ldquoImagequality assessment from error visibility to structural similarityrdquo IEEETransactions on Image Processing vol 13 no 4 pp 600ndash612 April2004

[40] B M Smith L Zhang J Brandt Z Lin and J Yang ldquoExemplar-basedface parsingrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition 2013 pp 3484ndash3491

[41] G Boracchi and A Foi ldquoModeling the performance of image restorationfrom motion blurrdquo IEEE Transactions on Image Processing vol 21no 8 pp 3502ndash3517 2012

[42] Q Shan J Jia and A Agarwala ldquoHigh-quality motion deblurring froma single imagerdquo Acm transactions on graphics (tog) vol 27 no 3p 73 2008

[43] S Cho and S Lee ldquoFast motion deblurringrdquo ACM Transactions ongraphics (TOG) vol 28 no 5 p 145 2009

[44] L Zhong S Cho D Metaxas S Paris and J Wang ldquoHandling noise insingle image deblurring using directional filtersrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition 2013pp 612ndash619

[45] O M Parkhi A Vedaldi and A Zisserman ldquoDeep face recognitionrdquoin British Machine Vision Conference 2015

[46] B Amos B Ludwiczuk and M Satyanarayanan ldquoOpenface A general-purpose face recognition library with mobile applicationsrdquo CMU-CS-16-118 CMU School of Computer Science Tech Rep 2016