A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

5

Click here to load reader

description

This paper presents efficient hardware architecture for the implementation of real-time MPEG artifact reduction.MPEG artifact reduction, or deblocking, implemented hereis based on modified Bilateral low-pass filter. Bilateralfilter is a type of non-iterative filter that preserve edgeinformation. When applied on images compressed withBDCT-based compressors, results in improved visualquality without over smoothing of the image [1],[2]. Wepropose a modified Bilateral filter (BF) that is sensitive tothe activity across the orthogonal block boundaries and thussuitable for the applications of deblocking. The proposedarchitecture demonstrates a good compromise betweenfiltering performance and FPGA resource requirements.The architecture was prototyped in hardware using FPGA(Field Programmable Gate Array). FPGA design and simulation was carried out using system-level design tool.

Transcript of A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

Page 1: A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

Charayaphan Charoensak

Digital Video Processing, Architecture and Standard Design, Philips Consumer Electronics 620A Lorong 1, Toa Payoh, Singapore 319762

emails: Charayaphan.Charoensak@Philips, [email protected]

Farook Sattar

School of Electrical and Electronic Engineering Nanyang Technological University

Nanyang Avenue, Singapore 639798 email: [email protected]

ABSTRACT

This paper presents efficient hardware architecture for the implementation of real-time MPEG artifact reduction. MPEG artifact reduction, or deblocking, implemented here is based on modified Bilateral low-pass filter. Bilateral filter is a type of non-iterative filter that preserve edge information. When applied on images compressed with BDCT-based compressors, results in improved visual quality without over smoothing of the image [1],[2]. We propose a modified Bilateral filter (BF) that is sensitive to the activity across the orthogonal block boundaries and thus suitable for the applications of deblocking. The proposed architecture demonstrates a good compromise between filtering performance and FPGA resource requirements. The architecture was prototyped in hardware using FPGA (Field Programmable Gate Array). FPGA design and simulation was carried out using system-level design tool.

1. INTRODUCTION

Today, digital medium is widely used for storage and transmission of video information. Many efficient and standardized video compression formats exist for various applications such as H.261, H.263, and MPEG-1/2/4. These compression formats are based on block-based discrete cosine transform (BDCT). BDCT is commonly used because of its near-optimum energy compaction and fast algorithm for hardware implementation. Most compression standards use 8x8 block discrete cosine transform (DCT). At high compression ratio, this BDCT method suffers some artifacts including blocking, ringing, and mosquito noise.

Several deblocking algorithms [3],[4] have been

developed and reported in publications. In the case of video applications such as in television sets, the digital data stream passes through many processing stages such as scaling, luminance transient improvement (LTI), and motion blur reduction. Thus, the blocking artifact may no longer appear as 8x8 block in size, and the block

boundaries could not be specified accurately. Thus, a practical post-processing deblocking algorithm should not base on a fixed block size and locations.

Methods for reducing the blocking artifacts may be grouped into three categories according to their means of reconstruction. The first category uses low-pass filtering [5]. The second category involves statistical estimation [6]. The last category involves set-theoretic reconstruction [7], which defines constraint sets from observed data and tries to reconstruct the original image by projecting onto convex sets (POCS). Last two categories require iterations, which are not practical for real-time processing because of the requirements for memories for storing video data.

There is an increasing demand for high-definition (HD) picture quality in the area of consumer-based television including full-HD television sets, HDTV, and blu-ray disc. The high-resolution display technologies make the MPEG artifacts more visible. Increasing the bit rate in the data stream in order to improve the picture quality is typically not possible. The post processing is the most feasible solution because it does not require any modification to the existing compression standards. The high computation power of hardware-based circuits such as FPGAs allows real-time processing at a reasonable cost.

This paper presents our work on hardware architecture

of FPGA-based circuit for MPEG artifact reduction suitable for video applications. The algorithm is based on modified Bilateral filter (BF). BF filter offers edge-preserving smoothing of the image and requires no iteration. The modified Bilateral filter discussed offers hardware simplification while at the same time sensitive to the activity across the orthogonal boundary around the block boundaries. The measurement of the activity is also used for adapting the BF parameters suitable for different level of artifacts. The result is improved visual quality without over smoothing the image details and sharp edges in the image. After MATLAB simulations, the final verification of the design was carried out using system level tool called System Generator from Xilinx [8].

Page 2: A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

2. BILATERAL FILTER AS AN IMAGE ESTIMATOR

Bilateral filter was first introduced by Smith and

Brady under the name “SUSAN” [9] and was later referred to as “Bilateral filter” [10]. The filter replaces each pixel by a weighted average of its neighbors. The weight assigned to each neighbor decreases with both the distance in the image plane and the distance on the intensity axis. Thus, it is a form of moving average adaptive filter weighted:

(1)

Here, yl and lx are the filter input and output values respectively, l and k are 2D-coordinates of the image pixel locations, lη is a neighborhood around l, . denotes the Euclidean distance, and w1(.) and w2(.) are weight functions. w1(.) is a function of absolute difference of brightness value and w2(.) is a function of Euclidean distance. The weight functions are usually chosen as Gaussians for w2(.) and exponential for w1(.).

The estimation x of the original signal is computed from a distorted signal y = x + n, where n is uncorrelated noise. The least mean-square (LMS) estimate is obtained by the conditional expectation.

(2)

and the linear solution of this problem is Wiener filter. Similarly, a locally adaptive Weiner filter is expressed as:

(3)

Here, the pixels in y located around position l are denotedζ . The correlation xyρ is defined; where its high value indicates that the observation belongs to the same structure, and low value for pixels that do not. This correlation within structure is expressed as 2 2( / )xy xρ σ σ= .

σ denotes the noise variance, and xσ the signal variance.

Typically, xyρ is closed to 1 and we may assume:

0.l xyx y kr if y elseρ ζ= ∈ (4)

Since the observations are corrupted by noise, we may present the probability ( | , )kP yζ that an observed value y at location k belongs toζ . If we assume constant variance and uncorrelated observations, a formulation similar to the Bilateral filter can be derived. yyR as well as

its inverse are diagonal with constant entries. The Weiner filter may then be implemented by the conditional average:

(5)

where ( | , )l

kk

kK y P yη

ζ∈

= ∑ is the normalization factor and

xyρ is constant. Equation (5) may be expressed in the form of Bilateral filter:

(6)

where

(7)

(8)

are the correspondence between the image estimator and the Bilateral filter. Thus, Bilateral filter may be used as an efficient image estimator.

3. PROPOSED HARDWARE ARCHITECTURE

FOR MPEG ARTIFACT REDUCTION

We show in last section that Bilateral filter may be used as efficient image estimator, and in our application, MPEG noise reduction. In this section, we propose architecture for effective MPEG artifact reduction. Our goals are to maintain low resource requirements:

1. It should effectively reduce MPEG artifacts,

blocking and mosquito noise, with minimal reduction in picture sharpness.

2. The deblocking algorithm should not depend on the fixed size and location of the block boundaries. This is because, in practice, the block boundaries could not be fixed.

3. It should be adaptive in removing the noise at different level.

4. It should be simple to be realized in hardware. It should not require frame buffer. Thus, the hardware will be practical for real-time video post-processing using low-cost hardware.

The Bilateral filter is a good choice for our applications

because it is non-iterative, robust, and relatively simple. The proposed modified Bilateral filter takes into account the activity across block boundary and is more effective on reducing the MPEG blockiness. The measurement of the activity across block boundary can easily be used to adjust the BF parameters according to the actual level of MPEG artifacts existing in the image.

1 2

1 2

( ) ( )

( ) ( )

l

l

l

k l kk

l kk

y w y y w l k

w y y w l kx η

η

− −=

− −

{ | }x E x y=

1l lx y yyx r R y−=

1 ( | , )l

l kk

ky P yK

ζ∈

= ∑

1 ( | , ) ( | , ),( | )

l

l kk

kk

k

p yy P yK p y

ζ ζ∈

= ∑

1( | , )(| |)

( | )l kk

k

p yw y yp y

ζ−

2 ( ) ( | )kl kw P ζ−

Page 3: A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

As shown in equation (1), the filtered output of the Bilateral filter is a function of two weights. w1(.) is a function of the absolute difference of brightness values at the two locations, and w2(.) is a function of Euclidean distance between the two locations. In our application, the typical Gaussians function is used for the weight w2(.).

To simplify the hardware, the weight function w1(.) is

defined as a step function. The transitions of the step function depend on the sum of absolute difference of pixel along the horizontal and vertical directions across the centre pixel, scaled by the measurement of the average activities along the block boundaries in the whole image. Fig. 1 shows the pixel coordinate system, the horizontal and vertical block boundaries, and the pixels defined for the measurement of activity around the boundaries. Note that for illustration purpose the block size is shown as 8x8. However, in typical application, the block size and location are not known. Here, h1 to h8 are pixels for measurement of activities along horizontal boundary and v1 to h8 are used for vertical boundary. Here, H1 to H4 refer to pixels from the below adjacent block. The measurement of activity along the bottom horizontal boundary is computed from pixels h5 to h8 and H1 to H4:

2h∆ = abs (h5-H1 + h6-H2 + h7-H3 + h8-H4) (9)

When the DC coefficient of the DCT used in the upper block is significantly different from that of the lower block, the four difference pairs, h5-H1, h6-H2, h7-H3, and h8-H4 will exhibit four offset values of the same sign and of significant magnitude. Thus, the measurement 2h∆ will be a large positive number indicating the activity on the lower horizontal boundary. Similarly, we define the activity of the upper horizontal boundary:

1h∆ = abs (h1-H5 + h2-H6 + h3-H7 + h4-H8) (10)

Similar definitions are used for measurements of activity on left and right vertical boundaries, 1v∆ and 2v∆ . If the summation of the four activity measurements,

1h∆ + 2h∆ + 1v∆ + 2v∆ , is higher than a thresholdλ , then the function w1(.) is zero, otherwise 1:

w1(.) = 0, when 1h∆ + 2h∆ + 1v∆ + 2v∆ < λ = 1, otherwise. The threshold λ is a function of the average

measurement of the boundary activity across the whole image. Replacing the exponential function with step function for the weight w1(.) removes the need for multiplier and look-up table for exponential function. This results in simplified circuit as well as shorter critical path, and thus a higher operating frequency.

Fig. 1 Boundaries, and pixels used for measurement of activity

4. HARDWARE PROTOTYPE DESIGN MPEG

ARTIFACTS REDUCTION This section describes the FPGA design of the

prototype circuit that implements the modified Bilateral filter for the MPEG deblocking. An integrated system-level environment called System Generator from Xilinx [8] is used. Using System Generator, the FPGA design and simulation is carried out using Simulink and Xilinx blocks. The FPGA functional simulation is done under MATLAB Simulink environment. After the successful simulation, the synthesizable VHDL code is automatically generated from the models for final FPGA implementation.

The modified Bilateral filter described in previous

section was implemented. For spatial weight w2(.), a Gaussian function of variance 2σ = 6, and the 7x7 convolution kernel were used. The values were proven to be a good trade-off between hardware complexity and the deblocking performance. The top-level design of the FPGA design for MPEG deblocking is shown in Fig. 2. Note that the design contains two subsystems labeled “Virtex2 7 Line Buffer” and “Filter”. The more detailed design of the subsystem “Filter”, which implements the modified Bilateral filter is shown in Fig. 3. Notice the sub-systems “sumval”, “sumweight”, and “7-tap BF filter”. The sub-systems “sumval”, “sumweigh” perform the summations:

1 2( ) ( )l

k l kk

y w y y w l kη∈

− −∑ (11)

and 1 2( ) ( )

l

l kk

w y y w l kη∈

− −∑ (12)

in equation (1) respectively. The Xilinx CORDIC divider block shown in Fig. 2 is used to perform the division for the weight factors mentioned above. The sub-system “7-tap BF filter” is a MAC-based implementation of the 7x7 modified BF filter as explained in last section.

Page 4: A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

Fig. 2 Top-level design of BF for MPEG deblocking using System

Generator

Fig. 3 Detailed circuit for the modified 7x7 BF and for the weight

summation for scaling

The FPGA reads in the gray scale image data sequentially from MATLAB workspace variable “grayScaleSignal”, and writes the filtered image data sequentially into the workspace named “filteredImage”. After the simulation is completed, a MATLAB program plots the input image and filtered output image for comparison. The result of the FPGA simulation is shown in Fig. 4. Fig. 4a is the input image with blocking and ringing artifacts clearly visible, Fig. 4b is the simulation output image, and Fig. 4c shows the absolute difference of the two images. It is observed in Fig. 4b that the blocking and ringing artifacts are much reduced with minimal reduction in sharpness and details. The performance measurement of the algorithm is measured by comparing the Power Signal-to-Noise Ratio (PSNR) of the image before, and after the BF filter:

2 2( ) 10 log ( ) ( )/R R

i R i RPSNR dB y i e i

∈ ∈

⎡ ⎤= ⎢ ⎥⎣ ⎦∑ ∑ (13)

where 2 ( )R

i Ry i

∈∑ is the measurement of image total energy

(R represents the image width and height dimension) and

2 ( )R

i Re i

∈∑ represent the error due to MPEG noise

calculated from: ( ) ( ) ( )e i x i y i= − (14)

Table 1 shows the comparison between the measured PSNRs before and after the implementation of modified BF filter discussed. It is shown that the BF filter improves the PSNR by 8.3 dB. The improvement results vary according to picture contents and more experiments are needed.

Table 1. Comparison between measured PSNRs with and without modified Bilateral filter

PSNR (dB) Before MPEG noise reduction using BF 25.5 After noise reduction using BF 33.8

5. FPGA SYNTHESIS RESULTS After successful simulation, the VHDL codes were

generated from the design. The VHDL codes were then synthesized using Xilinx ISE 8.1i, targeted for Xilinx Spartan3 family. The optimization setting is for maximum clock speed. Table 2 details the resource requirements of the design. Note that in practice, additional circuit is needed for input/output interface, and synchronization. Note also that system-level design using System Generator may not offer optimal gate requirements and clock speed.

Table 3 shows the reported maximum path delay and

the highest FPGA clock frequency. Because the 7-tap MAC-based is used for the filter, the actual maximum pixel rate achievable is 81.2/7=11.6 Million pixels/second. This is slightly slower than the typical frequency of 13.5 MHz required for un-scaled standard definition (SD) television. More work is needed for hardware optimization and poly-phase filter may be used. Additional circuitry for color space transformation is also needed.

Table 2. Resource utilization of the FPGA design for MPEG

deblocking based on modified Bilateral filter Number of Slice for Logic 1,550 Number of Slice for Flip Flops 950 Number of 4-inputs LUTs 6,002

Table 3. Maximum combinational path delay and operating

frequency of the FPGA design Maximum path delay from/to any node 13.8 nSec Maximum operating frequency 81.2 MHz

6. CONCLUSION In this paper, we present an FPGA implementation of

a modified Bilateral filter for the application of MPEG deblocking. The modified Bilateral filter architecture takes into account the activity along the horizontal and vertical directions, which represents the blockiness around the block boundaries. The architecture is simplified and the step function is used for the weight w1(.) instead of a decay exponential function. This results in less hardware resource needed.

Page 5: A REAL-TIME IMPLEMENTATION OF FPGA HARDWARE FOR MPEG ARTIFACT REDUCTION FOR VIDEO APPLICATIONS

(a)

(b)

(c)

Fig. 4 FPGA simulation results. (a) input image with visible blocking and mosquito noise, (b) output of BF filter, (c) the absolute difference

of (a) and (b)

The hardware implementation of the algorithm was realized using FPGA. The FPGA design was carried out using a relatively new system level tool called System Generator from Xilinx. The FPGA functional simulations were carried out to verify and measure the deblocking performance of the proposed architecture. After the successful simulation, the VHDL code for the design was generated and synthesized. The estimated FPGA resource requirement is reported. The estimated maximum operating speed of the FPGA designed suggest that the design can

operate closed to a real-time standard definition television frame rate. Additional work on testing with more images, optimization and improvement of the design, and real-time demonstration of the system is needed.

It is found that Bilateral filter is very efficient for the

application on removing MPEG artifacts because it preserves edge information, requires no iteration, stable, and relatively simple to realize in hardware. Thus, the filter offers much potential for real-time applications including JPEG and MPEG deblocking, impulse noise removal, and high-quality image up-sampling. The applications for colored image may be realized by first performing color space transformation from, RGB to YUV or HSV for example, then performing the deblocking on the luminance information, and then inverse the transformation to generate the RGB output. In another area, the development of local adaptive Bilateral filter, for improved performance, is a good subject for future work.

7. REFERENCES

[1] Skočir, P., Marušič, B., Tasič, J., “A three-dimensional

extension of the SUSAN _lter for wavelet video coding artifact removal,” in Proc.Electrotechnical Conference, MELECON 2002, pp. 395-398, 2002.

[2] Elad, M., “On the origin of the bilateral filter and ways to improve it,” IEEE Transactions on Image Processing, pp. 1141-1151, 2002.

[3] Peter List, et al, “Adaptive Deblocking Filter,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 614-619, July 2003.

[4] S. D. Kim, J. Yi, H. M. Kim, and J. B. Ra, “A deblocking filter with two separate modes in block-based video coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 156–160, Feb. 1999.

[5] Y. F. Hsu and Y. C. Chen, “A new adaptive separable median filter for removing blocking effects,” IEEE Trans. Consum. Electron., vol. 2, no. 3, pp. 91–95, Mar. 1993.

[6] J. Luo, C. W. Chen, K. J. Parker, and T. S. Huang, “Artifact reduction in low bit rate DCT-based image compression,” IEEE Trans. Image Process., vol. 5, no. 9, pp. 1363–1368, Sep. 1996.

[7] P. L. Combettes, “The foundations of set theoretic estimation,” Proc.IEEE, vol. 81, no. 2, pp. 182–208, Feb. 1993.

[8] Xilinx Inc., System Generator v8.1 for the MathWorks Simulink: Quick Start Guide, 2006.

[9] Smith, S.M., Brady, J.M., ”SUSAN – a new approach to low level image processing,” International Journal of Computer Vision 23, pp. 45–78, 1997.

[10] Tomasi, C., Manduchi, R., “Bilateral filtering for gray and color images,” in IEEE Proc. Int. Conf. Computer Vision, pp. 839–846, 1998.