Multi-frame Super Resolution Ba sed on Block Motion Vector...

11
Multi-frame Super Resolution Based on Block Motion Vector Processing and Kernel Constrained Convex Set Projection Miao Liu* and Yuzhong Shen Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529 USA ABSTRACT Even though substantial progress has been made in super resolution research, many issues regarding robust sub-pixel estimation and fast implementation of feature preserving restoration still exist. To obtain more reliable sub-pixel information, we proposed to correct mis-aligned sub-pixels by motion vector (MV) processing based on hierarchical block partition and weighted vector median filtering (WVMF). Two indices – relative displaced frame difference and motion vector similarity degree – are computed and compared with trained thresholds to classify the motion blocks into reliable and unreliable groups. Then the unreliable blocks are divided into four sub-blocks with their motion vector processed by WVMF based on the reliability information of their neighborhood blocks. To preserve the local features such as edge direction, strength as well as its spread region, anisotropic kernels are learned from local gradient fields to represent edge information. Finally, a kernel constrained projection is established for restoring high resolution frames. The experimental results show that the proposed algorithm preserves important features in the images and outperforms the traditional POCS method. Keywords: super resolution image processing, weighted vector median filtering, motion block classification, motion vector processing, kernel learning, projection on to convex sets (POCS) 1. INTRODUCTION Super resolution image reconstruction is a classical signal estimation problem, and it has wide applications in remote sensing, video frame freezing, security surveillance as well as medical diagnostics. In the last two decades, many methods have been proposed to estimate high resolution (HR) images from low resolution (LR) sequences. Restoration oriented methods represent the problem in spatial domain or transformation domain and incorporate various prior constraints to approximate the degradation process. Many methods, such as ML, MAP, POCS, can then be used to solve the inverse problem. Example based methods make use of parametric models to predict the high resolution images [1, 2]. Even though substantial progress has been made in super resolution research, many open issues still exist [1]. One important issue is the robustness of the estimated subpixel information. DFD (displaced frame difference) is widely used as a criterion to reject motion error [3]. In [4] a validation map was utilized to indicate the reliability of motion estimation in different regions. However, those methods not only discard the pixels corresponding to outlier (due occlusion/cover phenomena), but also discard pixels which still bear useful information, but are not aligned. The second issue is how to restore aligned subpixel information in case of degradation such as blur, down sampling and compression. In order to correct the misaligned subpixels, we propose a motion vector (MV) processing method based on iterative block partition and weighted vector median filtering. To address the second issue, edge preserved methods have been proposed such as the one developed in [5], since images are anisotropic. Recently, kernel regression based on singularity value decomposition method was proposed [6], and achieved remarkable results. Since kernels could simultaneous describe the local features such as edge direction and its strength, along with scaling factor, we propose to estimate kernel information and describe them as a set of weights, based on which a modified iterative projection formula is established for restoring high resolution frames by projection onto convex sets (POCS). The rest of the paper is organized as follows. Section 2 reviews the image degradation process and it's mathematical model. In section 3, based on the analysis of motion estimation errors, we present a subpixel motion vector processing Visual Communications and Image Processing 2009, edited by Majid Rabbani, Robert L. Stevenson, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7257, 72571J · © 2009 SPIE-IS&T · CCC code: 0277-786X/08/$18 · doi: 10.1117/12.805742 SPIE-IS&T/ Vol. 7257 72571J-1

Transcript of Multi-frame Super Resolution Ba sed on Block Motion Vector...

Page 1: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

Multi-frame Super Resolution Based on Block Motion Vector Processing and Kernel Constrained Convex Set Projection

Miao Liu* and Yuzhong Shen

Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529 USA

ABSTRACT Even though substantial progress has been made in super resolution research, many issues regarding robust sub-pixel estimation and fast implementation of feature preserving restoration still exist. To obtain more reliable sub-pixel information, we proposed to correct mis-aligned sub-pixels by motion vector (MV) processing based on hierarchical block partition and weighted vector median filtering (WVMF). Two indices – relative displaced frame difference and motion vector similarity degree – are computed and compared with trained thresholds to classify the motion blocks into reliable and unreliable groups. Then the unreliable blocks are divided into four sub-blocks with their motion vector processed by WVMF based on the reliability information of their neighborhood blocks. To preserve the local features such as edge direction, strength as well as its spread region, anisotropic kernels are learned from local gradient fields to represent edge information. Finally, a kernel constrained projection is established for restoring high resolution frames. The experimental results show that the proposed algorithm preserves important features in the images and outperforms the traditional POCS method. Keywords: super resolution image processing, weighted vector median filtering, motion block classification, motion vector processing, kernel learning, projection on to convex sets (POCS)

1. INTRODUCTION Super resolution image reconstruction is a classical signal estimation problem, and it has wide applications in remote sensing, video frame freezing, security surveillance as well as medical diagnostics. In the last two decades, many methods have been proposed to estimate high resolution (HR) images from low resolution (LR) sequences. Restoration oriented methods represent the problem in spatial domain or transformation domain and incorporate various prior constraints to approximate the degradation process. Many methods, such as ML, MAP, POCS, can then be used to solve the inverse problem. Example based methods make use of parametric models to predict the high resolution images [1, 2]. Even though substantial progress has been made in super resolution research, many open issues still exist [1]. One important issue is the robustness of the estimated subpixel information. DFD (displaced frame difference) is widely used as a criterion to reject motion error [3]. In [4] a validation map was utilized to indicate the reliability of motion estimation in different regions. However, those methods not only discard the pixels corresponding to outlier (due occlusion/cover phenomena), but also discard pixels which still bear useful information, but are not aligned. The second issue is how to restore aligned subpixel information in case of degradation such as blur, down sampling and compression. In order to correct the misaligned subpixels, we propose a motion vector (MV) processing method based on iterative block partition and weighted vector median filtering. To address the second issue, edge preserved methods have been proposed such as the one developed in [5], since images are anisotropic. Recently, kernel regression based on singularity value decomposition method was proposed [6], and achieved remarkable results. Since kernels could simultaneous describe the local features such as edge direction and its strength, along with scaling factor, we propose to estimate kernel information and describe them as a set of weights, based on which a modified iterative projection formula is established for restoring high resolution frames by projection onto convex sets (POCS). The rest of the paper is organized as follows. Section 2 reviews the image degradation process and it's mathematical model. In section 3, based on the analysis of motion estimation errors, we present a subpixel motion vector processing

Visual Communications and Image Processing 2009, edited by Majid Rabbani, Robert L. Stevenson, Proc. of SPIE-IS&TElectronic Imaging, SPIE Vol. 7257, 72571J · © 2009 SPIE-IS&T · CCC code: 0277-786X/08/$18 · doi: 10.1117/12.805742

SPIE-IS&T/ Vol. 7257 72571J-1

Page 2: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

JOt!OU

IU!W1I

method by hierarchical block partition and weighted vector median filtering. Section 4, introduces a kernel based feature estimation method, and employs the learned kernel as a local constraint from which a novel POCS frame is established for SR restoration. Experimental result using standard sequences are presented in Section 5. Section 6 concludes this paper and discusses several perspectives for further improvement.

2. PROBLEM REPRESENTATION

Videos can lose spatial resolution, due to the limitations of the optical imaging system, down sampling, and compression. Denote the degraded low-resolution frames of size h vN N× by { kz } 1k M= L ,the original high resolution

images of size v v h hr N r N× by { kf }, where vr and hr represent horizontal and vertical down-sampling factors, respectively. The relationship between the frames in the original high resolution sequence be represented by

,, k j

Mk k j jf M f n= + , (1)

where ,k jM is a motion matrix of size v h v h v h v hr r N N r r N N× and ,k j

Mn represents the error caused by the limitation of motion estimation methods and the cover/uncover due to the relative motion between object and background. Errors of large magnitude, or so called outliers, will cause significant distortions, or artifacts in the SR restoration results. Thus it is required using either a detection and removal procedure, prior to the application of a SR algorithm or an outlier robust SR algorithm [7]. However, for many cases, because of the limitation of motion estimation methods, directly applying detection and removal procedure discards some misaligned information which is not outlier resulting in unnecessary information loss. Considering the degradation caused by blurring and down sampling in addition to the motion relation between two frames, the relationship between captured image kz and original HR images can be expressed by

,, k j

Mk k k k j j kz DBf n DBM f DBn n= + = + + , (2)

where matrix B is a blurring matrix of size v h v h v h v hr r N N r r N N× , D the down-sampling matrix of size v h v h v h v hr r N N r r N N× , and kn the additive noise. To estimate the desired high resolution image kf , one needs to solve the inverse problem of (1) and (2) by estimating B, M, and the measurement noise, The general process to perform such an HR restoration task is illustrated in Figure 1. In this paper, we only consider the motion estimation error, which is more significant in video processing.

M M

1z

2z

1nz −

nz

fM

Figure 1. The general process of SR restoration

3. SUBPIXEL MOTION VECTOR PROCESSING BASED ON BLOCK PARTITION AND WEIGHED VECTOR MEDIAN FILTERING

In [8], we have shown that the distribution of the motion estimation residual has heavy tails, and applied adaptive threshold to avoid the artifact caused by using the unreliable information, However, this method not only removed the unreliable motion information, but also discarded the useful information that are not aligned. To tackle this problem,

SPIE-IS&T/ Vol. 7257 72571J-2

Page 3: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

instead of using involved motion estimation methods, we suggest using the available reliable MV to ameliorate those unreliable motion blocks by hierarchical block splitting and weighted vector median filtering. 3.1 Motion Block Classification To estimate the motion matrix between two frames, we can use block matching, optical flow estimation or other parametric techniques. In general, it is difficult to accurately capture all the motion information in video frames by using one particular motion estimation method, unless the motion pattern is fixed. Thus when processing several objects which have different motion patterns, the motion estimation error or registration error is unavoidable. On the other hand, employing several methods will increase computation burden. To strike a balance, we concentrate on block matching, since it is directly available in decoder. In order to obtain more reliable subpixel information, we need to discern the reliability of motion blocks and re-estimate the motion vector of those unreliable ones. To carry out the first task, we calculate the ratio of sum of block residue to sum of block value as

( )( ) ( )

( )( , ) _

( , ) _

, ,

,

i j y xm n Block k

im n Block k

f m n f m v n vE k

f m nλ∈

− + +=

+

∑∑

, (3)

where ,y xv v v→ ⎡ ⎤= ⎣ ⎦ is the estimated motion vector of the kth block, λ is a regularization factor to avoid dividing by zero exception. If it is larger than a threshold ET , the corresponding blocks is labeled as unreliable, otherwise it is reliable. This method is similar to the work in [7], but their goal was for frame rate up-conversion which is different from SR. Additionally, in texture-like regions, even though blocks are aligned, they might still be mistaken for unreliable based on (3), since their motion estimation error might be large due to the irregularity and subtlety of texture region. To solve this dilemma, we assume that as long as the motion fields are consistent in the neighborhood, the motion vectors are reliable. So we further define a motion vector consistency index Ac which equals to the number of blocks in the surrounding neighborhood blocks which have similar motion vectors as the center block. For the kth block, it is calculated as

( ) ( ){ }

A kjj N k

c k sχ∈

= ∑ , (4)

where ,k j

kjk j

v vs v v= denotes angular similarity degree between motion vector of the kth block and the motion vector of its

neighborhood block j, { }N s is the collection of neighborhood blocks surrounding the kth block, and

( ) 1, 0,

kj Skj

if s Ts

elseχ

>⎧= ⎨⎩

is the similarity indication function. Based on forenamed two indices, the motion blocks are

classified into reliable groups and non-reliable groups based on the following criterion For k = 1 : N If EE T< then the motion estimation is reliable Else If

A Cc T< then the motion estimation is reliable Else The motion estimation is unreliable End END End

ST , ET and CT are thresholds determined experimentally. Their values in this paper are set to 0.9, 0.05 and 5, respectively.

SPIE-IS&T/ Vol. 7257 72571J-3

Page 4: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

ViA KGI!]?rtpbrYcJ 1;

3.2 Motion Vector Re-estimation by Weighted Vector Median Filtering After reliability classification, those unreliable regions will be split into sub-blocks for MV re-estimation. To reduce possible errors that might be introduced by performing MV estimation on small blocks, we do not use motion estimation methods to re-estimate the motion vectors, but replace the unreliable MVs with the MV of its reliable neighboring blocks with weighted vector median filter constrained by the reliability and similarity, which is defined as

1 1

, 1,2, ,N N

i WVM i i j ip pi i

w v v w v v j N= =

− ≤ − =∑ ∑ K , (5)

where { ,

0, 1, i j M

if block i is unrealiblei if block i is reliable and d Tw >= and ,i jd denotes the angular difference between iv and jv , defined as

, 1 i ji j

i j

v vd

v v⋅

= − , N is the number of motion vectors surrounding the block whose MV will be processed. In our work, N is set

to 8, MT is 0.4.

Figure 2. The block diagram of hierarchical block partition and MV filtering methods

(a) (b) (c)

-200 -150 -100 -50 0 50 100 150 2000

20

40

60

80

100

120

DFD distribution in step 2 mean = 0.48724 std =8.9237

-200 -150 -100 -50 0 50 100 150 2000

20

40

60

80

100

120

DFD distribution in step 2 mean = 0.48652 std =8.9168

-200 -150 -100 -50 0 50 100 150 2000

20

40

60

80

100

120

DFD distribution in step 3 mean = 0.48231 std =8.892

(a) (b) (c)

Figure 3. The block partition and motion vector filtering processing. The black labels indicate unreliable blocks. The red arrow represent motion vector. (a) (d) The results from original motion estimation and reliability classification. (b) (e) The first step motion block partition and MV filtering (c) (f) The second step motion block partition and MV filtering.

SPIE-IS&T/ Vol. 7257 72571J-4

Page 5: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

After estimating the new MVs for those unreliable sub-blocks, the difference between each sub-block in the observation frame and its new corresponding sub-blocks in the reference frame is calculated; if it is smaller than the original one, we re-label the sub-block as reliable one and update the related motion information of that sub-block. Otherwise the block is still considered as unreliable and left for further processing. The process of block reliability classification, WVMF, and splitting is repeated until the size of the sub-blocks becomes 2×2 which corresponds to the smallest processing unit. Finally, those regions which are still labeled as unreliable will not be used for restoration, since we are not sure whether they are useful or not. The diagram of the processes of the motion block classification and MV processing by weighted vector median filtering is summarized in Figure 2. To illustrate this process and its effectiveness, Figure 3 further visualizes the intermediate results fromthe proposed method. It can be seen that after motion vector processing, some MVs of previously misaligned sub-blocks are corrected (the black blocks denote the blocks whose motion information are unreliable), and the distribution of difference of two frames becomes more centralized, as the standard deviation decreased. Meanwhile, also note that there are some large values in the histogram, become those values are pertaining to outliers caused by occlusion and cover, and they can be easily eliminated in restoration process by the outlier rejection method.

4. SR RESTORATION BASED ON KERNEL CONSTRAINED POCS

After subpixel information is classified and aligned, SR restoration can be performed by any image restoration algorithms [1, 2]. Here we concentrate on the implementation of POCS which is an effective method for providing feasible solution underpinned by set theory [9]. But directly using such a method can cause ringing artifact along object's edge. In order to overcome such problem, A. J. Patti et al. [5] proposed an edge constrained projection based on the fact that edges are anisotropic and used the local covariance to estimate the edge direction and strength, and imposing the estimated deformed point spread function (PSF) to reconstruct the HR images. Recently, H. Takeda et al. [6] proposed a nonparametric method by representing the local information in terms of kernels. Kernels are essentially covariance of local image intensity and can represent more comprehensive information such as edge direction and strength and scaling factor. Thus we suggest applying the kernel information as a set of weight to constraint the HR restoration in the POCS framework. The general idea of POCS is to define a group of closed, convex sets, one for each pixel within the low-resolution image sequences ( )

1 21 2 1 2 1 2 1 2( , ), , { ( , , ) : ( , , ) ( , , ) }D kn n

C m m k f n n k z m m k f n n k h δ= − ≤∑ . (6)

The variable kδ is an a priori bound reflecting the statistical confidence within which the original HR image f is a member of the set ( )1 2, ,DC m m k , which acts as data consistency constraint. Stating from an arbitrary initialization, for example, bilinear interpolation 0f , an HR image is iteratively estimated through successive projections of the previous estimate onto the consistency sets as well as the amplitude constraint set that restricts the gray level of the estimate to [0, 255]. The operator h denotes the general optical point spread function accounting for the sensor-scene relative motion or optical blur. Here we assume the form of PSF as Gaussian blur which is represented as

( )

2 2

2, exp , ( , ) ,2

x yh x y A x y wσ

⎧ ⎫+= − ∈⎨ ⎬

⎩ ⎭ (7)

where A is a constant, and σ is the spread region of the PSF. Considering that general 2D images are anisotropic, the local information contains direction, intensity as well as diffusion area which should be preserved in SR restoration process. In other words, the restoration procedure should preserve image feature. To this end, we need to make some modification of the restoration formulae which are derived from the data consistency constraint (6). This can be carried out by incorporating kernels in to the PSF (7). The kernels

SPIE-IS&T/ Vol. 7257 72571J-5

Page 6: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

can be assembled by the methods in [6] as TK RSRγ= , in which S represents the elongation matrix which corresponds to the intensity of gradient, R represents rotation matrix pertaining to the gradient direction, γ is the scaling factor representing the diffusion area. These three components can be estimated from the covariance of local gradient by performing singular value decomposition (SVD) to local gradients matrix ( ), ,y xG x y f f⎡ ⎤= ⎣ ⎦ as

( ) 1

2

, HG x y U Vλ

λ⎡ ⎤= ⎢ ⎥⎣ ⎦

, (8)

where U contains two eigenvectors representing the dominant direction of local gradients. The corresponding two eigenvalues 1 2,λ λ⎡ ⎤

⎣ ⎦ represent the strength of the gradient along and perpendicular to dominant direction respectively.

After assembling the kernel K from the direction and strength parameter from SVD, we can build an elongated, rotated and scaled PSF for SR restoration as

( ) [ ] [ ]0 0 0 02

, ,, exp , ( , )

2

Tx x y y K x x y yh x y A x y w

σ

⎧ ⎫− − − −⎪ ⎪= − ∈⎨ ⎬⎪ ⎪⎩ ⎭

, (9)

where [ ]0 0,x y is the center of local window w, whose size should be adjusted in certain range. From (6), (7) and (9), we can derive the final iterative projection formula as

( )

( )( ) ( )

( )( )

'

'2

,

, ( , ) ( , ), , ( , )

( , )

( , ) ( , ) 0, , ( , )

, (l l

l l k l l h hl l k l lr r

h hi r j r

h h h h l l k l lr x y

l l k l

r x y x y h x i y jr x y x y

h x i y j

P f x i y j f x i y j r x y x y

r x y x

δδ

δ

δ

=− =−

− + +>

+ +

+ + = + + + ≤

+

∑∑

( ) ( )'

'2

, ) ( , ), , ( , )

( , )

l h hl l k l lr r

h hi r j r

y h x i y jr x y x y

h x i y jδ

=− =−

⎧⎪⎪⎪⎪⎪⎨⎪

+ +⎪ <⎪+ +⎪

⎪⎩∑∑

(10)

where ( )r

', ( , ) ( , ) ( , )r

l l l l h h h hi r j r

r x y x y h x i y j f x yz=− =−

= − + +∑ ∑ .

5. EXPERIMENT RESULTS

The performance of the proposed method is tested by the following two sets of experiments. Five frames of the MPEG test sequences "Flower and Garden"(145-149), and "Bus"(1-5) are used in our experiments. Their original resolution is 288×352. After degrading by 3×3 Gaussian blur and down sampling, we obtained the five LR images of resolution 144×176. We use frame 147 in "Flower and Garden", frame 3 in "Bus" sequences as anchor frames. Subpixel motion estimation is carried out by two level hierarchical block matching followed by motion vector processing. Kernels are estimated from the gradient of anchor frame. Finally, the five LR frames are fused in the proposed kernel constrained projection onto convex set framework. We apply both visual quality assessment and objective quality indices to evaluate the restoration results. The objective quality indices include projection error (PE), peak to signal noise ratio (PSNR) and structure similarity index measure (SSIM) [11]. These three indices are respectively defined as

r

'( , ) ( , ) ( , )r

l l h h h hi r j r

PE x y h x i y j f x yz=− =−

= − + +∑ ∑ , (12)

[ ]2

1 1 2

0 0

max( )10 log

1 ˆ( , ) ( , )M N

i j

fPSNR

f i j f i jMN

− −

= =

⎧ ⎫⎪ ⎪⎪ ⎪= ⎨ ⎬⎪ ⎪⎡ ⎤−⎣ ⎦⎪ ⎪⎩ ⎭

∑∑ , (13)

SPIE-IS&T/ Vol. 7257 72571J-6

Page 7: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

( )( )( )( )

ˆˆ 1 2

2 2 2 2ˆ ˆ1 2

2 2covf f ff

f ff f

c cSSIM

c c

µ µ

µ µ σ σ

+ +=

+ + + +. (14)

PE measures the projection error. PSNR is traditional way of evaluating the signal quality which based on the mean square error (MSE) of reconstructed HR image f̂ and original HR image f. The third index - SSIM - is a relatively new, yet widely adopted visual quality assessment index for measuring the structure similarity with two images. It is based on the hypothesis that the human visual system (HVS) is highly adapted for extracting structure information. fµ

and f̂µ are local mean, 2

fσ and 2f̂

σ are local variance, and ˆcov f f denotes the covariance of f and f̂ .

(a) (b)

(c) (d)

SPIE-IS&T/ Vol. 7257 72571J-7

Page 8: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

(e) (f)

Figure 4. Visual results of image 247 of "Flower and Garden" sequence images. (a) Degraded image. (b) Bilinear interpolation. (c) POCS without motion filtering. (d) POCS with motion filtering without kernel learning. (e) POCS with kernel constraint without motion vector filtering. (f) POCS with motion filtering and kernel constraint.

0 5 10 15 20 25 30 35 40 45 503

4

5

6

7

8

9

iterative time

aver

age

proj

ectio

n er

ror

PE curve of varioius methods

POCSMEPOCSKLPOCSKLMEPOCS

(a)

0 5 10 15 20 25 30 35 40 45 5019.4

19.5

19.6

19.7

19.8

19.9

20

20.1

20.2

20.3

iterative time

PS

NR

(dB

)

PSNR curve of varioius methods

POCSMEPOCSKLPOCSKLMEPOCS

0 5 10 15 20 25 30 35 40 45 500.68

0.69

0.7

0.71

0.72

0.73

0.74

iterative time

Mea

n S

SIM

Mean SSIM curve of varioius methods

POCSMEPOCSKLPOCSKLMEPOCS

(b) (c)

Figure 5. Objective parameter for assessing the restoration results of image 247 of "Flower and Garden" sequence. MEPOCS: POCS with motion fields processing. KLMEPOCS: POCS with motion fields processing and modified data consistency constraint.

SPIE-IS&T/ Vol. 7257 72571J-8

Page 9: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

The visual results of “Flower and Garden” sequences are shown in Figure 4. Figure 4(a) shows the degraded image, Figure 4 (b) using bilinear interpolation. The interpolated image is smooth and blurred since on extra information was incorporated. Figure 4 (c) shows the results using POCS after 10 iterations. The reconstructed image is much clearer than Figure 4 (b), but the there are some artifacts around the contour of the trunk. Figure 4 (d) shows the results of POCS with WVM filtering of motion fields. There are less artifacts in Figure 4(d) than Figure 4(c). The result of kernel constrained POCS, but without WVM filtering is shown in Figure 4(e). As we can see, the edges become sharper, but there is still some visible artifacts. The result of POCS with WVM filtering and kernel learning is shown in Figure 4(f). It produces the best visual result with sharp edges and clearer details especially on the trunk and roof.

(a) (b)

(c) (d)

SPIE-IS&T/ Vol. 7257 72571J-9

Page 10: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

(e) (f)

Figure 6. Visual results of image 3 of the "Bus" sequence images. (a) Degraded image. (b) Bilinear interpolation. (c) POCS without motion filtering. (d) POCS with motion filtering without kernel learning. (e) POCS with kernel constraint without motion vector filtering. (f) POCS with motion filtering and kernel constraint.

1 2 3 4 5 6 7 8 9 106

7

8

9

10

11

12

13

14

15

iterative time

aver

age

proj

ectio

n er

ror

PE curve of varioius methods

POCSMEPOCSKLPOCSKLMEPOCS

(a)

1 2 3 4 5 6 7 8 9 1021.4

21.6

21.8

22

22.2

22.4

22.6

22.8

23

23.2

23.4

iterative time

PS

NR

(dB

)

PSNR curve of varioius methods

POCSMEPOCSKLPOCSKLMEPOCS

1 2 3 4 5 6 7 8 9 100.71

0.72

0.73

0.74

0.75

0.76

0.77

0.78

0.79

iterative time

Mea

n S

SIM

Mean SSIM curve of varioius methods

POCSMEPOCSKLPOCSKLMEPOCS

(b) (c)

Figure 7. Objective parameters for assessing the restoration results of image 3 of "Flower and Garden" sequence. MEPOCS: POCS with motion fields processing. KLMEPOCS: POCS with motion fields processing and modified data consistency constraint. Additionally, "objective" parameters of restored images after each iteration steps are calculated and plotted in Figure 5. As we can see, as for this sequence, even though PSNR of proposed method is not the highest, it is still much larger than

SPIE-IS&T/ Vol. 7257 72571J-10

Page 11: Multi-frame Super Resolution Ba sed on Block Motion Vector ...web.mit.edu/miaoliu/www/publications/VCIP09.pdf · DFD distribution in step 2 mean = 0.48652 std =8.9168-200 -150 -100

the result by bilinear interpolation (PSNR is 18.85 dB, MSSIM is 0.64). Moreover, it is worth noting that SSIM of the proposed methods is the largest, which means that the restored results are best, in terms of structural similarity. The visual results of “Bus” sequences are shown in Figure 6. This sequence contains moving objects – bus – in the background , which can cause great motion estimation errors and artifacts by simple estimation methods. For example, one obvious artifacts is that there some ghost on the left and right side of door handle on the front car in Figure 6(c)(e). After using WVMF, some mismatched blocks could be re-aligned, eliminating ghost artifacts as shown in Figure 6(d)(f). Moreover, after applying kernel constrained POCS, we can obtain more clearer detailed structure and sharp edges like the roof of the bus and the structures surrounding the house head. Various performance metrics versus iterative time are depicted in Figure 7. As one can find, the proposed method has the best performance in the first 10 iterations amongst the testing methods used in this paper.

6. CONCLUSIONS In this paper, we propose a novel multi-frame super resolution image restoration method based on block motion vector processing and kernel constrained projection onto convex set. By employing relative displaced frame difference and motion vector similarity degree, motion blocks are classified into reliable and unreliable groups. Following reliability classification, the unreliable blocks are divided into four subblocks with their motion vector processed by WVMF, the weights of which are determined by reliability and similarity of MV of neighboring blocks. Finally, realigned subpixels are fused to create HR images by POCS which are constrained by kernels learning from local gradient. The experimental results demonstrate the proposed method can improve the visual quality and enhance the detailed structural information of degraded image sequences. In the future work, we need to more effective methods for kernel estimation. Even though pixel-wise manner kernel estimation is accurate, yet it is computationally expensive. To reduce computational complexity, we can consider utilizing some prior information of images, such as sparsity and remote distance structure similarity to infer the kernels based on some partial information. Furthermore, we will develop a generalized procedure for processing the MVs estimated from hybrid motion estimation methods, so that the current methods can be adaptive to process sequences that have more complex motion models.

REFERENCES [1] K. Aggelos, M. Rafael, and Mateos J., “Super Resolution of Images and Video,” Morgan & Claypool (1-133) 2007. [2] S. C. Park, M. K. Park, and M. G. Kang., “Super-resolution image reconstruction: a technical overview,” IEEE

Signal Processing Magazine, 20(3), 21-36 (2003). [3] R. R. Schultz, L. Meng, and R. L. Stevenson, “Subpixel Motion Estimation for Super-Resolution Image Sequence

Enhancement,” J. Visual Commun. Image Represent., 9(1), 38-50 (1998). [4] P. E. Eren, M. I. Sezan, A. M. Tekalp, Robust, “object-based high-resolution image reconstruction from

low-resolution video,” IEEE Trans.on Image Processing, 6(3), 1446-1451(1997) [5] A. J. Patti, and Y. Altunbasak, “Artifact reduction for set theoretic super resolution image reconstruction with edge

adaptive constraints and higher order interpolants,” IEEE Trans .on Image Processing,10(10), 179-186 (2001) [6] H. Takeda, S. Farsiu, and P. Milanfar, “Kernel Regression for Image Processing and Reconstruction,” IEEE trans.

on Image Processing, 16(2), 349-366 (2007). [7] M.V.W. Zibetti, J. Mayer, “A Robust and Computationally Efficient Simultaneous Super-Resolution Scheme for

Image Sequences,” IEEE Trans. on Circuits and Systems For Video Technology, 17(10), 1288-1300 (2007) [8] M. Liu, H. Cao, X. Li, S.Yi. “Super resolution reconstruction based on motion estimation error and edge adaptive

constraints” Defense and Security Symposium, Proc. SPIE, 6246, 62460B (2006) [9] P. L. Combettes, “The foundation of set theoretic estimation,” Proc. IEEE, 81, 182-208 (1993) [10] A. Huang, and T. Q. Nguyen, “A Multistage Motion Vector Processing Method for Motion-Compensated Frame

Interpolation,” IEEE trans. Image Processing, 17(5), 694-708 (2008). [11] Z. Wang, A. C. Conrad, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: From error visibility to

structure similarity," IEEE Trans. Image Processing,13(4), 600-612 (2004)

SPIE-IS&T/ Vol. 7257 72571J-11