Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6....

21
1 23 Multimedia Tools and Applications An International Journal ISSN 1380-7501 Volume 67 Number 3 Multimed Tools Appl (2013) 67:529-547 DOI 10.1007/s11042-012-1048-6 Parallel structure-aware halftoning Huisi Wu, Tien-Tsin Wong & Pheng- Ann Heng

Transcript of Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6....

Page 1: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

1 23

Multimedia Tools and ApplicationsAn International Journal ISSN 1380-7501Volume 67Number 3 Multimed Tools Appl (2013) 67:529-547DOI 10.1007/s11042-012-1048-6

Parallel structure-aware halftoning

Huisi Wu, Tien-Tsin Wong & Pheng-Ann Heng

Page 2: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

1 23

Your article is protected by copyright and

all rights are held exclusively by Springer

Science+Business Media, LLC. This e-offprint

is for personal use only and shall not be self-

archived in electronic repositories. If you wish

to self-archive your article, please use the

accepted manuscript version for posting on

your own website. You may further deposit

the accepted manuscript version in any

repository, provided it is only made publicly

available 12 months after official publication

or later and provided acknowledgement is

given to the original source of publication

and a link is inserted to the published article

on Springer's website. The link must be

accompanied by the following text: "The final

publication is available at link.springer.com”.

Page 3: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547DOI 10.1007/s11042-012-1048-6

Parallel structure-aware halftoning

Huisi Wu · Tien-Tsin Wong · Pheng-Ann Heng

Published online: 8 March 2012© Springer Science+Business Media, LLC 2012

Abstract Structure-aware halftoning technique is one of the state-of-the-art al-gorithms for generating structure-preserving bitonal images. However, the slowoptimization process prohibits its real-time application. This is due to its high com-putational cost of similarity measurement and iterative refinement. Unfortunately,the structure-aware halftoning cannot be straightforwardly parallelized due to itsdata dependency nature. In this paper, we propose a parallel algorithm to boostthe optimization of the structure-aware halftoning. Our main idea is to exploit thespatial independence during the evaluation of the objective function and temporalindependence among the iterations. Specifically, we introduce a parallel Poisson-diskalgorithm during the selection of pixel swaps, which guarantees the independencybetween parallel processes. Graphics processing unit (GPU) implementation ofthe technique leads to a significant speedup without sacrificing the quality. Ourexperiments demonstrate the effectiveness of the proposed parallel algorithm ingenerating structure-preserving bitonal images with much less time, especially forlarge images.

Keywords Digital halftoning · GPU · SSIM · Parallel poisson-disk sampling

1 Introduction

Halftoning is a commonly used technique in the fields of digital printing anddisplay systems. It is a process to generate a bitonal image having similar look

H. Wu (B)College of Computer Science and Software Engineering, Shenzhen University,364 Administration Building, Shenzhen, People’s Republic of Chinae-mail: [email protected]

T.-T. Wong · P.-A. HengDepartment of Computer Science and Engineering, The Chinese University of Hong Kong,Shatin N.T., Hong Kong

Author's personal copy

Page 4: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

530 Multimed Tools Appl (2013) 67:529–547

as its input grayscale image. The desired properties of a halftone image includethe tone consistency, blue noise property and structure consistency. Most of theexisting methods handle the tone or blue noise properties properly, such as errordiffusion [21], but this usually results in loss of details, and plain or blurry regions(Fig. 1b). Several methods which rely on edge enhancement techniques [11, 15],deal better with texture preservation but are still not sufficient to satisfy the humansensitivity to the structures (Fig. 1c). Structure-aware halftoning [22] is the state-of-the-art technique for generating structure-preserving bitonal images (Fig. 1d). Thedigital halftoning is formulated as the minimization of an objective function thataccounts for both local tone similarity and structural similarity to the original image.

(a) Original image (b) Ostromoukhov method

(c) Edge enhancement (d) Structure-aware

Fig. 1 Digital halftoning with different methods. Note that the structure-aware method faithfullypreserves the texture details as well as the local tone. All images have the same resolution of300 × 300

Author's personal copy

Page 5: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 531

Although the structure detail can be maintained in halftone images by optimizingthe objective function, the quality of the halftone images depends heavily on theiteration of the optimization, which is very slow. In general, millions of iterationsare required to obtain the final halftone image. This slow convergence processprohibits its practical use. Unfortunately, we cannot directly change it to parallelimplementation due to the data dependency nature of the iterations. This imposesan upper limit on the achievable computation speed and prevents this algorithmfrom taking advantage of recent advances in parallel computing architectures suchas GPUs and CPU clusters.

In this paper, we propose a parallel algorithm to boost the structure-awarehalftoning on GPUs. The basic idea is to exploit the spatial independence duringthe evaluation of the objective function and temporal independence among the iter-ations. We localize and divide the objective function into a number of independentsub-objective functions, and introduce a parallel Poisson-disk [30] scheme duringthe selection of the pixel swaps. Therefore, the swapping of multiple pairs can bedone without interference with each other, and the sub-objective functions can beupdated independently during the optimization. On the other hand, the calculationof the sub-objective functions, which includes the structural similarity (SSIM) andtone similarity, is formulated as several localized image filtering, and is computed inparallel. GPU implementation of the technique leads to a significant speedup withoutsacrificing the quality. We demonstrate the effectiveness of the proposed parallelalgorithm in generating high quality structure-preserving bitonal images with muchless time.

The rest of this paper is organized as follows. In Section 2, we introduce the relatedworks. Section 3 describes the proposed parallel structure-aware halftoning method.Experiment results and performance evaluation are presented in Section 4. Finally,we draw a conclusion in Section 5.

2 Related work

Digital halftoning remains an active area of research [28]. The previous work forhaltoning can be classified into three categories [14]: point processes, neighborhoodprocesses, and iterative processes. Point processes [1, 12, 17, 19] perform pixelwisecomparison with a threshold to determine the halftoned value of a pixel; neighbor-hood based methods [3, 7, 8, 16, 18, 21, 24] compare the sum of the current pixeland weighted neighborhood errors with a threshold. In general, point processes orneighborhood based methods have low computational complexity, but it may haveundesirable artifacts or loss of textural detail. To obtain better results, iterative orsearch based methods are used [10, 13, 20, 22, 23], which try to minimize an objectivefunction and search for an optimized haltfone result. They provide more flexibilityand can be easily tailor-made for various objectives, so as to produce significantlybetter halftone result. However, iterative methods are usually computationallyintensive. They require millions of passes to converge to the final halftone image.The bottleneck of the computation is in the evaluation of the objective function aswell as the slow convergence of the iteration. Unfortunately, the spatial correlationin evaluating objective function and the temporal dependency among the iterations

Author's personal copy

Page 6: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

532 Multimed Tools Appl (2013) 67:529–547

usually make the optimization have to be done sequentially, and thus it is difficult tospeed up the process by direct parallelization.

General-purpose computation on graphics processing units (GPGPU) is anemerging research topic in various areas. It refers to the exploration of the com-putational power of GPU for the purpose other than graphics rendering. The rapidimprovement on the performance of GPU, the data parallelism nature, coupled withimprovements on its programmability, have made GPU a competitive platform forcomputationally intensive tasks in a wide variety of application domains. One of themost common applications is fast image sampling or processing, such as parallel Pois-son disk sampling [2, 6, 9, 30], parallel filtering [25] and parallel edge detection [4].However, many applications still exist for which GPUs are not well suited. Thus,methods to integrate GPGPU powers into broader practical applications are stillbeing intensively investigated.

3 Algorithm

3.1 Structure-aware halftoning

Before we continue, we first briefly review the structure-aware halftoning. Topreserve the characteristic look of the textured regions in a halftone image, Panget al. [22] proposed a structure-aware halftoning technique. Given a grayscale imageI, the corresponding halftone image Ih is obtained by minimizing the followingobjective function:

Objective(I, Ih) = wgG(I, Ih) + wt(1 − MSSIM(I, Ih)) (1)

where G(I, Ih) measures the tone similarity between I and Ih; MSSIM(I, Ih) mea-sures the structure similarity between I and Ih. The wg and wt are the weightingfactors, such that wg + wt = 1.

To preserve the overall tone similarity, G(I, Ih) is simply formulated as theMSE between the Gaussian-blurred grayscale input g(I) and the Gaussian-blurredhalftone image g(Ih), written as

G(I, Ih) = 1M

M∑(g(I) − g(Ih))

2 (2)

where the valid range of G(I, Ih) is [0, 1].On the other hand, MSSIM(I, Ih) evaluates the overall structure similarity (SSIM)

[29] by taking the average SSIM over all pixels:

MSSIM(I, Ih) = 1M

M∑SSIM(x, y) (3)

where the valid range of MSSIM is [0, 1], with higher values indicating highersimilarity. For each corresponding pair of pixels from I and Ih, the SSIM(x, y)

Author's personal copy

Page 7: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 533

measures the local structure similarity in their local neighborhoods x and y, where xand y are two nonnegative aligned image signals, each with N elements.

SSIM(x, y) = (2μxμy + k1)(2σxy + k2)

(μ2x + μ2

y + k1)(σ 2x + σ 2

y + k2)(4)

where μ is the Gaussian weighted mean intensity; σ is the standard deviation; σxy

defines the inner product of σx and σy; and k1 and k2 are small positive constants toavoid singularity.

To minimize the objective function as in (1), Pang et al. [22] used a simulatedannealing strategy. The optimization starts with any bi-tonal image with globalgrayness (ratio of black to white pixels) equivalent to that of the original grayscaleimage. Such initialization is performed by randomly distributing black/white pixelssuch that the overall grayness is maintained. During each iteration, a pair of blackand white pixels are randomly picked and swapped. If the swapping decreases theobjective evaluation, the swapping is accepted. Otherwise, the swapping is canceled.Since no extra black or white pixel is introduced, the overall grayness is maintainedduring the optimization.

It is noticeable that the convergence of above optimization is very time-consuming. On one hand, the computation of combination of pixel values in thehalftone image is in exponential growth with the image size. For example, thereare 2u×v possible combination of the pixel values for an image with a resolution ofu × v. The random swap of a pair of black and white pixels make the convergenceprocess need millions of iterations. On the other hand, during each iteration, theevaluation of the objective function, including MSSIM(I, Ih) and G(I, Ih), requireshuge calculation in summation and filtering operations within the input image.

3.2 Parallelism

Note that the above structure-aware halftoning cannot be straightforwardly par-allelized due to its data dependency nature. There are tow data dependencies:spatial dependency during the evaluation of the objective function and temporaldependency among the swaps. The evaluation of the MSSIM(I, Ih) and G(I, Ih)

involves all the pixel values of the whole image, which induces the spatial dependencyprohibiting the spatial parallelism. On the other hand, the serial random swaps ofa pair of black and white pixels make the optimization strictly sequential in timedomain, as the output of the current swap is the input of the next swap, which inducesthe temporal dependency prohibiting the temporal parallelism. Besides, the step bystep calculation of the objective function is also temporal dependent. For example,the calculations of σx, σy and σxy depend on the results of μx and μy. To tackleboth the spatial dependency and the temporal dependency, we propose a parallelstructure-aware halftoning algorithm.

To minimize the data dependency, our basic idea is to exploit the spatial andtemporal parallelisms of the optimization in the structure-aware halftoning. Wereformulates the objective function using a localization strategy. As a result, theevaluation of the objective function no longer involves all pixels of the input image,but only involves the neighboring patch around the pixel being swapped. By thisway, we successfully break the spatial dependency using spatial localization. On the

Author's personal copy

Page 8: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

534 Multimed Tools Appl (2013) 67:529–547

other hand, we introduce a parallel Poisson-disk algorithm [30] during the selectionof multiple pixel swaps to break the temporal dependency among iterations, whichguarantees the independency to enable parallel processes.

3.2.1 Spatial localization

The evaluation of the objective function, mainly including MSSIM(I, Ih) andG(I, Ih), can be localized according to the following key observation. For eachcorresponding pair of pixels from I and Ih, the SSIM measures only the localstructure similarity in their neighborhoods. Therefore, if we randomly change a pixelvalue in the Ih (e.g, from 0 to 1), the update of the SSIM index only occurs in a patchsurrounding the pixel being updated. As shown in Fig. 2, due to the swapping of twopoints p1 and p2, the change of the SSIM values only occurs within the green patchesB1 and B2, where the size of the green patch is determined by the block size usedfor calculating the SSIM. Similarly, if we randomly change a pixel value in the Ih, theupdate of the tone similarity G(I, Ih) also only occurs within the patch surroundingthe pixel being swapped.

Based on the above observation, we come to the conclusion that the rejection oracceptance of a random swapping can be determined by a calculation which onlyinvolves a local neighboring patch. Suppose the window size used to calculate theSSIM is m × m, and the Gaussian kernel size used to calculate the tone similarity isn × n, then the window size of the neighboring patch to be updated due to the swapis max(m,n) × max(m,n). Suppose the two pixels to be swapped is p1 and p2, andwe denote the two neighboring patches corresponding to p1 and p2 as B1 and B2, as

kernel size

p1

p2

B1

B2

Fig. 2 Range of influence for a random swap. The change of the SSIM(I, Ih) and G(I, Ih) valuesonly occurs within the green patches B1 and B2 due to the swap of two points p1 and p2. Thus,instead of using MSSIM(I, Ih) and G(I, Ih), we can use the local summation of the SSIM and tonesimilarity within the two green patches B1 and B2 to determine whether to swap a pair of pixels p1and p2. The size of the patches is determined by the larger kernel size between the SSIM and tonesimilarity

Author's personal copy

Page 9: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 535

shown in Fig. 2. Then we reformulate the objective function of swapping p1 and p2

as the following equation.

Objective(I, Ih)p1,p2 = wg

B1,B2∑(g(I) − g(Ih))

2 + wt(1 −B1,B2∑

SSIM(I, Ih)) (5)

Even though the derivation from the original objective function to the aboveformulation is simple, it still make considerable sense as the localized objectivefunction removes the data dependency and make the spatial parallelism possible.

3.2.2 Parallel random swapping

Since the random swapping of a pair of pixels p1 and p2 only involves twocorresponding patches B1 and B2, we can select multiple random pairs of pixelsin the input image to accelerate the optimization process by parallel swapping.Considering that we accept or reject the swapping based on the local summationof the SSIM(I, Ih) and G(I, Ih) within the two neighboring patches, the neighboringpatches of different swaps cannot interfere with each other. For example, if we wantto simultaneously swap two pair of points (p1, p2) and (p3, p4), as shown in Fig. 3,then the green patches B1 and B2 cannot overlap with the orange patches B3 andB4. However, there is no such requirement for the two neighboring patches of thesame swap. For example, the two green patches B1 and B2 can overlap with eachother. Therefore, the criterion for multiple pairs selection is to maintain a sufficientdistance from one pair to another to avoid spatial conflicts.

Suppose a number of pairs {P1, P2, P3, · · · , Pn} are selected to be parallel swapped(e.g., Fig. 4), where p1 and p2 form the pair Pi, p3 and p4 form the pair P j, we thendefine d(Pi, P j) as the inter-pair distance between pair Pi and pair P j, which is theminimal distance from one pixel in Pi to another pixel in P j, written as

d(Pi, P j) = min(‖ pi − pj ‖) pi ∈ Pi, pj ∈ P j (6)

During the parallel swapping, we add a minimal distant constraint for d(Pi, P j) toavoid spatial conflict between Pi and P j. Suppose the window size used to calculatethe SSIM is m × m, and the Gaussian kernel size used to calculate the tone similarity

Fig. 3 Criterion for parallelswapping. Since we accept orreject the swapping based onthe local summation of theSSIM(I, Ih) and G(I, Ih)

within the two neighboringpatches, the neighboringpatches of different swapscannot interfere with eachother (e.g., the green patchesB1 and B2 cannot overlap withthe orange patches B3 and B4).However, there is no suchrequirement for the twoneighboring patches of thesame swap. For example, thetwo green patches B1 and B2can overlap with each other

p1p1

p 2p 2

p3p3

p4p4

B1B1

B2B2

B3B3

B4B4

Author's personal copy

Page 10: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

536 Multimed Tools Appl (2013) 67:529–547

Fig. 4 Nonlocal parallelswapping. All the points to beswapped are the Poisson disksamples generated with aminimal distance r. By thisway, we can guarantee both ofthe inter-pair and inner-pairdistances are not less than r,such as pair (p1, p2) and pair(p3, p4)

r

p1

p2

p3 p4

is n × n, then we require d(Pi, P j) >√

2max(m,n), so that we can avoid spatialconflict.

To accelerate the optimization process, our goal is to select as many pairs aspossible for parallel swapping. We formulate the process of selecting multiplerandom pairs of pixels from the input image as a Poisson disk sampling [5, 6, 9],which not only randomly locates the samples but also keeps the samples at least aminimal distance r apart from one another. For our implementation, we employ theparallel Poisson disk sampling algorithm proposed by Wei et al. [30], which is one ofthe state-of-the-art techniques implemented on the GPU.

A. Nonlocal Parallel Swapping

For the sake of simplicity, we perform the Poisson disk sampling over the inputimage using r = √

2max(m,n). Given an image with a resolution of u × v, we generate� u

r × vr � Poisson disk samples with a minimal sampling distance r. We then couple the

samples with one other by random combination. By this way, it is noticeable that wecan guarantee d(Pi, P j) >

√2max(m,n), as shown in Fig. 4.

However, such an effective strategy for selecting multiple pairs introduces anobvious bias during the optimization. For each selected pair Pi(p1, p2), the inner-pair distance ‖ p1 − p2 ‖ always larger than

√2max(m,n), which is not necessary.

Here, we name such a parallel swapping of nonlocal parallel swapping, as both inter-pair and inner-pair distances of all selected pairs are not less than r = √

2max(m,n).To make up the above sampling bias during the parallel coupling, we propose localparallel swapping.

B. Local Parallel Swapping

For local parallel swapping, we still require the inter-pair distance d(Pi, P j) to benot less than r = √

2max(m,n) to avoid spatial conflict, but the inner-pair distancesof selected pairs cannot be larger than r to make up the nonlocal parallel swapping.As shown in Fig. 5a, given an image with a resolution of u × v, we generate � u

3r ×v3r � Poisson disk samples with a minimal distance 3r. Then we couple each samplewith a new point which is generated by moving the selected sample with a random

Author's personal copy

Page 11: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 537

Fig. 5 Local parallelswapping. (a) All the points atthe center of pink circles arethe Poisson disk samplesgenerated with a minimaldistance 3r. Another point ineach pink circle is generatedwith a random displacementrange from 0 to r. (b) It isnoticeable that the inter-pairdistance is not less than r andthe inner-pair distances are notlarger than r, such as pair(p1, p2) and pair (p3, p4)

displacement range from 0 to r (Fig. 5b). By this way, it is noticeable that we canguarantee the inter-pair distance is not less than r and the inner-pair distance is notmore than r.

In our parallel halftoning optimization, we perform local parallel swapping inone iteration and nonlocal parallel swapping in the next, iteractively. The local andnonlocal parallel swaps make up for each other in terms of inner-pair distance. Thus,we successfully remove sampling bias during the optimization.

3.3 GPU-SSIM and GPU-TONE

After the spatial-temporal dependency is broken, the calculations of the objectivefunctions during the parallel random swapping are independent with one another.Therefore we can evaluate all the objective functions in parallel. To calculatethe objective function between I and Ih, the GPU-TONE and GPU-SSIM are

Author's personal copy

Page 12: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

538 Multimed Tools Appl (2013) 67:529–547

implemented on the GPU to calculate the tone similarity and SSIM in parallel. Itis quite straightforward to implement the GPU-TONE in the GPU using a GPUGaussian filter, which can be easily carried out by a fragment shader. The calculationsof μx, μy, σx, σy and σxy in SSIM are the local summation of weighted neighborhoodpixels and can also be considered as filtering operations. As the calculations of theσx, σy and σxy depend on the result of μx and μy, we use a pipeline method tocalculate the μx, μy, σx, σy and σxy simultaneously.

Algorithm 1 Pseudo-code of parallel structure-aware halftoning

(1) Initialization:Partition I and Ih into uniform blocksparallel foreach corresponding blocks Ib and Ib

hInitialize Ib

h by TonePreserveInit(Ib)endcount = 1

(2) Do While(t < limit) //Render loop//Select pairsif(count % 2 == 1)

//for nonlocal parallel random swappingParallel PoissonDiskSampling(r)

Random couple the sampleswith one another

else//for local parallel random swappingParallel PoissonDiskSampling(3r)

Couple each sample with itsrandom offset point

end//optimizationparallel foreach pair points p1 and p2

Eold =Objective(I, Ih)p1,p2

Ih = Swap(p1, p2)Enew =Objective(I, Ih)p1,p2

�E = Enew − EoldIf (�E > 0)//reject the swap if energy increaseIh = UndoSwap(p1, p2)end

endcount++

3.4 Parallel optimization

Our parallel optimization algorithm is summarized in Algorithm 1. The functionTonePreserveInit initializes the halftone image by randomly distributing blackand white pixels. The only criterion is to maintain the overall grayness. During theiterations, the local parallel swapping and nonlocal parallel swapping are executed

Author's personal copy

Page 13: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 539

alternately. For the odd-numbered iterations, the function PoissonDiskSamplinggenerates samples with a minimum distance r. We randomly couple the sampleswith one another and perform the nonlocal parallel random swapping. For the even-numbered iterations, the function PoissonDiskSampling generates samples witha minimal distance 3r. We couple each sample with its random offset point andperform the local parallel swapping. Each swapping is accepted or rejected accordingto whether the energy decreases.

According the above parallelism, it is quite straightforward to implement theparallel optimization on a GPU using fragment shaders. A major practical issueis memory storage. The original grayscale image is stored in a 2D texture. ForPoisson disk sample storage, we construct two frame buffer objects (FBOs) andping-pong between them in the generation of samples [30]. For the evaluation ofthe objective function before and after parallel random swapping, we construct twoFBOs respectively that can be pipelined for calculation of μx, μy, σx, σy and σxy.Since we need undo several swaps after the evaluation of the objective function,we mask out all the accepted pairs of samples in the FBO, and perform the undoswapping in parallel. For halftone images storages, we construct two FBOs andping-pong them in each iteration. During the parallel Poisson disk sampling andlocal parallel random swapping, we also generate random number using GPU. Sincecurrent GPUs do not provide such routines we have to implement our own. Inour current implementation we use the hash-based method as presented in Tzenget al. [26].

4 Results and analysis

To evaluate the performance of our method, we test it on examples with differentresolutions ranging from 128 × 128 to 2048 × 2048. In our experiments, we follow theparameters setting of the original structure-aware method [22], since the relationshipbetween structure similarity and tone similarity does not change in our parallelformulation. Specifically, we set both the window size of SSIM and kernel sizeof tone similarity to be 11 × 11. For the weighting factors wg and wt, we still setwg = wt = 0.5 to balance texture details preservation and tone preservation. Moredetail description about the relationship between structure detail preservation andthe weighting factors wg and wt can be found in [22]. For the implementation, weadopt OpenGL and GLSL for shader development. All of the following evaluationsare conducted on a PC with Intel(R) Core(TM) i7 X980 CPU 3.33GHz, 12GBmemory, and GeForce GTX 295.

4.1 Quality

To evaluate the quality of our method, we run it on diverse examples and comparewith different methods. Similar with Pang et al. [22], we measure the quality ofhalftoning methods based on three criteria: tone consistency, structural preservationand blue noise property. Figures 6, 7 and 8 show the comparison results of ourmethod with Ostromoukhov method [21], edge enhancement [15], contract-awarevariant [16] and original structure-aware method [22]. Compared to Ostromoukhovmethod and edge enhancement, structure-aware method generally preserves more

Author's personal copy

Page 14: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

540 Multimed Tools Appl (2013) 67:529–547

(e) Original structure-aware (10hr) (f) Parallel structure-aware (5s) (g) Parallel structure-aware (10s)

(a) Original image (b) Ostromoukhov method (0.82s) (c) Edge enhancement (2.9s) (d) Contract-aware variant (15.2s)

Fig. 6 Peacock. The resolution of all images is 980 × 1280. The pure software implementation (e)requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in10 s (f–g)

structural details regarding to human visual system (HVS). As shown in Figs. 6–8e–g, the generated halftone images preserve visually sensitive texture details aswell as the local tone, without introducing annoying patterns. In contrast, the edge

(e) Original structure-aware (10hr) (f) Parallel structure-aware (5s) (g) Parallel structure-aware (10s)

(a) Original image (b) Ostromoukhov method (0.77s) (c) Edge enhancement (2.86s) (d) Contract-aware variant (14.9s)

Fig. 7 Tiger. The resolution of all images is 920 × 1360. The pure software implementation (e)requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in10 s (f–g)

Author's personal copy

Page 15: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 541

(e) Original structure-aware (10hr) (g) Parallel structure-aware (10s)(f) Parallel structure-aware (5s)

(a) Original image (b) Ostromoukhov method (0.8s) (c) Edge enhancement (2.88s) (d) Contract-aware variant (15.1s)

Fig. 8 Pineapple. The resolution of all images is 900 × 1400. The pure software implementation (e)requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved in 10 s(f–g)

enhancement may over-emphasize the edges and degrade the resemblance to theoriginal grayscale image. Since the edges are detected with a threshold, the edgeenhancement method may fail to preserve the weak edges and blurry regions, suchas the halftone images shown in Figs. 6–8c. By enhancing contrast, contract-awaremethod can produce halftoning images of visual quality approximate to the originalstructure-aware method, but it still cannot maintain some structure details, as shownin Figs. 6–8d. Thanks to the parallel implementation, our method outperforms allcompetitors within 5–10 s, such as the halftoning images shown in Figs. 6–8f, g.

In general, our method outperforms the original structure-aware method ingenerating structure preserving halftone images with significant less amount of time,especially for large images. As shown in Figs. 6–8e–g, the pure software implementa-

Table 1 PSNR and MSSIM comparison for “peacock”

Ostromoukhov Edge Contract-aware Structure-aware method

method enhancement variant Original Parallel Parallel(10 h) (5 s) (10 s)

PSNR 19.60 21.08 22.76 23.29 23.78 24.53MSSIM 0.62 0.76 0.81 0.85 0.89 0.92

Author's personal copy

Page 16: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

542 Multimed Tools Appl (2013) 67:529–547

Table 2 PSNR and MSSIM comparison for “tiger”

Ostromoukhov Edge Contract-aware Structure-aware method

method enhancement variant Original Parallel Parallel(10 h) (5 s) (10 s)

PSNR 20.20 22.45 23.08 23.76 24.08 24.72MSSIM 0.56 0.71 0.77 0.81 0.83 0.89

tion Figs. 6–8e requires 10 h in achieving the comparable results of parallel structure-aware halftone achieved within 5 s Figs. 6–8f.

For a quantitative comparison, we evaluate the preservation of image intensityand structure similarity using PSNR and MSSIM respectively. Specially, the PSNRand MSSIM comparisons for “peacock”, “tiger” and “pineapple” are shown inTables 1, 2 and 3. From the statistics, our method generally outperforms all com-petitors in preserving the tone similarity and structure similarity.

In addition, we also measure the blue-noise property by computing the Fourierspectrum and radially averaged power spectra of the halftoning results, which iswidely used in measuring the quality of halftoning methods [27]. We compareour method with Ostromoukhov method, a well-known method in maintaining thebluenoise property. As shown in Fig. 9, given a constant grayness image, we producethe halftone images using Ostromoukhov method, original structure-aware and ourmethod, respectively. The visual results are shown in the upper row of Fig. 9,and the corresponding radially averaged power spectra are shown underneath. Itis noticeable that all of the results are with a low energy characteristics at lowfrequencies, showing the similar blue noise profile.

4.2 Time statistics

We further collect the time statistics to compare our method with the originalstructure-aware method. Since the convergence of the halftoning process dependson the number of random swap, the performance can be evaluated with the com-putational time per swap, as shown in last column of Tables 4 and 5. Besides thetotal time for each pass and computational time per swap, we also evaluate thebreakdown of the computational time for a clear comparison. Thus, the breakdownof computational time of software-based method [22] and ours is also shown inTables 4 and 5 respectively. As we initialize the halftone image using the samestrategy, the initialization time is excluded from the Tables. The total computationtime of each pass optimization of the two methods is shown in column “Total”. The“Others” refers to the time for swap operation as well as data transfer. The time of“Sampling” in our methods is very tiny because the number of sampling is small (e.g.,

Table 3 PSNR and MSSIM comparison for “pineapple”

Ostromoukhov Edge Contract-aware Structure-aware method

method enhancement variant Original Parallel Parallel(10 h) (5 s) (10 s)

PSNR 21.62 20.95 22.10 22.89 24.39 25.53MSSIM 0.54 0.55 0.72 0.83 0.88 0.94

Author's personal copy

Page 17: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 543

(a) Ostromoukhov method (b) Original structure-aware (c) Parallel structure-aware

Vis

ual r

esul

tR

adia

lly a

vera

ged

pow

er s

pect

ra

Fig. 9 A spectral analysis of halftoning a constant-grayness image (grayness = 0.3). (a), (b) and (c)show the analysis of Ostromoukhov method, original structure-aware and our method, respectively.The corresponding radially averaged power spectra are shown underneath

Table 4 Time statistics for original structure-aware halftoning (in seconds)

Image Original structure-aware halftoning

size SSIM Tone Others Total # swaps Per swap

1282 0.021 0.002 0.001 0.024 1 0.0242562 0.037 0.011 0.001 0.049 1 0.0495122 0.313 0.081 0.001 0.395 1 0.39510242 1.601 0.432 0.001 2.034 1 2.03412802 2.613 0.682 0.001 3.296 1 3.29620482 7.131 1.903 0.001 9.035 1 9.035

Table 5 Time statistics for parallel structure-aware halftoning (in seconds)

Image Parallel structure-aware halftoning

size SSIM Tone Sampling Others Total # swaps Per swap

1282 6.21 × 10−4 2.12 × 10−4 2.81 × 10−8 1.16 × 10−6 8.33 × 10−4 135 6.17 × 10−6

2562 1.41 × 10−3 4.1 × 10−4 2.81 × 10−8 1.18 × 10−6 1.82 × 10−3 541 3.36 × 10−6

5122 6.82 × 10−3 2.29 × 10−3 2.81 × 10−8 1.17 × 10−6 9.11 × 10−3 2,166 4.21 × 10−6

10242 0.024 9.0 × 10−3 2.81 × 10−8 1.17 × 10−6 0.033 8,665 3.81 × 10 − 612802 0.0375 0.0125 2.81 × 10−8 1.17 × 10−6 0.05 13,540 3.69 × 10−6

20482 0.081 0.029 2.81 × 10−8 1.19 × 10−6 0.11 34,664 3.17 × 10−6

Author's personal copy

Page 18: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

544 Multimed Tools Appl (2013) 67:529–547

Fig. 10 Running timecomparison. Software versusGPU SSIM

0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

1.0

2

3

4

5

6

7

8

Image size(Million Pixels)

Tim

e(se

cond

)

SSIM Execution Time

Software SSIMGPU SSIM

only 34664 samples for 2048 × 2048 image). Due to the parallel processing nature ofGPU and the efficiency of accessing textures, the “SSIM”, “Tone” and “Others” aremuch faster than the software-based method. Figure 10 shows the timing statisticsfor “SSIM” that compares the original software-based method with ours. Moreover,our method can parallel swap multiple pairs in one pass. The number of parallel swapincreases with the image size are proportionate to the increases in the costs of time in“SSIM” and “Tone”, which make the computational time of our method preserve aconstant order of magnitude (10−6). The speedup of our method is apparently shownin Tables 4 and 5, especially for high-resolution images (up to about 300,000 timesfor 2048 × 2048 image).

5 Conclusion

In this paper, we present a parallel structure-aware halftoning technique for main-taining image structure as well as the tone similarity. Compared to the standard edge-enhancement and the state-of-the-art error diffusion, our method preserves bettertexture content that is sensitive to HVS. Compared to the original structure-awaremethod, spatio-temporal parallelism and GPU implementation of the techniqueleads to a significant speedup without sacrificing the quality. Our experimentsdemonstrate the effectiveness of the proposed parallel algorithm in generatingstructure-preserving bitonal images with significant less amount of time, especiallyfor large images. Thanks to the parallelism of GPU, our tests demonstrate thathigh-quality halftone images, regardless of their resolution, can be generated withinseconds of time.

Acknowledgements We would like to thank all reviewers for their valuable suggestions to improvethe paper. This work was supported in part by grants from Hong Kong RGC General ResearchFund (Project No. CUHK 417411) and CUHK SHIAE Project Funding (Project No. SHIAE-MMT-P2-11).

Author's personal copy

Page 19: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 545

References

1. Bayer BE (1973) An optimum method for two-level rendition of continuous tone pictures. In:Proceeding of the IEEE international conference on communications, vol 26. IEEE, New York,pp. 2611–2615

2. Bowers J, Wang R, Wei LY, Maletz D (2010) Parallel Poisson disk sampling with spectrumanalysis on surfaces. ACM Trans Graph (SIGGRAPH Asia 2010 issue) 29:166:1–166:10

3. Chang J, Alain B, Ostromoukhov V (2009) Structure-aware error diffusion. ACM Trans Graph(SIGGRAPH Asia 2009 issue) 28:162:1–162:8

4. Chen J, Paris S, Durand F (2007) Real-time edge-aware image processing with the bilateral grid.ACM Trans Graph 26(3):103:1–103:9

5. Cook RL (1986) Stochastic sampling in computer graphics. ACM Trans Graph 5(1):51–726. Ebeida MS, Davidson AA, Patney A, Knupp PM, Mitchell SA, Owens JD (2011) Efficient

maximal poisson-disk sampling. ACM Trans Graph (SIGGRAPH 2011 issue) 30:49:1–49:127. Floyd RW, Steinberg L (1974) An adaptive algorithm for spatial grey scale. In: SID international

symposium digest of technical papers. Society for Information Display, Washington, DC, pp 36–378. Fung YH, Chan YH (2010) Green noise digital halftoning with multiscale error diffusion. IEEE

Trans Image Process 19(7):1808–18239. Gamito MN, Maddock SC (2009) Accurate multidimensional Poisson-disk sampling. ACM Trans

Graph 29:8:1–8:1910. Guo JM (2007) A new model-based digital halftoning and data hiding designed with lms opti-

mization. IEEE Trans Multimedia 9(4):687–70011. Hwang BW, Kang TH, Lee TS (2004) Improved edge enhanced error diffusion based on first-

order gradient shaping filter. In: IEA/AIE’2004: proceedings of the 17th international conferenceon innovations in applied artificial intelligence. Springer, New York, pp 473–482

12. Sullivan JR, Ray LA, Miller R (1991) Design of minimum visual modulation halftone patterns.IEEE Trans Syst Sci Cybern 21(1):33–38

13. Kim JS, Lee HJ (2008) A subfield coding algorithm for the reduction of gray level errors due toline load in a plasma display panel. IEEE Trans Circuits Syst Video Technol 18(6):827–839

14. Kim SH, Allebach JP (2002) Impact of hvs models on model-based halftoning. IEEE TransImage Process 11(3):258–269

15. Kwak NJ, Ryu SP, Ahn JH (2006) Edge-enhanced error diffusion halftoning using humanvisual properties. In: ICHIT ’06: proceedings of the 2006 international conference on hybridinformation technology. IEEE Computer Society, Washington, pp 499–504

16. Li H, Mould D (2010) Contrast-aware halftoning. Comput Graph Forum 29(2):273–28017. Li P, Allebach JP (2000) Look-up-table based halftoning algorithm. IEEE Trans Image Process

9(9):1593–160318. Li P, Allebach JP (2004) Tone-dependent error diffusion. IEEE Trans Image Process 13(2):

201–21519. Mese M, Vaidyanathan PP (2002) Tree-structured method for lut inverse halftoning and for

image halftoning. IEEE Trans Image Process 11(6):644–65520. Monga V, Damera-Venkata N, Evans BL (2007) Design of tone-dependent color-error diffusion

halftoning systems. IEEE Trans Image Process 16(1):198–21121. Ostromoukhov V (2001) A simple and efficient error-diffusion algorithm. In: SIGGRAPH, pp

567–57222. Pang WM, Qu Y, Wong TT, Cohen-Or D, Heng PA (2008) Structure-aware halftoning. ACM

Trans Graph (SIGGRAPH 2008 issue) 27(3):89:1–89:823. Rodriguez JB, Arce GR, Lau DL (2008) Blue-noise multitone dithering. IEEE Trans Image

Process 17(8):1368–138224. Schmaltz C, Gwosdek P, Bruhn A, Weickert J (2010) Electrostatic halftoning. Comput Graph

Forum 29(8):2313–232725. Su Y, Xu Z, Jiang X (2008) Gpgpu-based Gaussian filtering for surface metrological data

processing. In: 12th international conference on information visualisation, pp 94–9926. Tzeng S, Wei LY (2008) Parallel white noise generation on a gpu via cryptographic hash. In:

Proceedings of the 2008 symposium on interactive 3D graphics, pp 79–8727. Ulichney R (1987) Digital halftoning. The MIT Press, Cambridge, MA. 27 June 198728. Ulichney R (2000) A review of halftoning techniques. In: Proc. of SPIE, vol 3963, pp 378–39129. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error

visibility to structural similarity. IEEE Trans Image Process 13(4):600–61230. Wei LY (2008) Parallel Poisson disk sampling. ACM Trans Graph 27(3):20:1–20:10

Author's personal copy

Page 20: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

546 Multimed Tools Appl (2013) 67:529–547

Huisi Wu received his B.Sc. and M.Sc. degrees in Computer Science from the Xi’an JiaotongUniversity (XJTU) in 2004 and 2007, respectively. He obtained his PhD degree in Computer Sciencefrom The Chinese University of Hong Kong (CUHK) in 2011. Currently, he is an assistant professorwith the College of Computer Science and Software Engineering, Shenzhen University, China. Hismain research interest is computer graphics, including digital halftoning, symmetry analysis, andimage summarization.

Tien-Tsin Wong received the B.Sc., M.Phil., and PhD degrees in computer science from the ChineseUniversity of Hong Kong in 1992, 1994, and 1998, respectively. Currently, he is a Professor in theDepartment of Computer Science & Engineering, Chinese University of Hong Kong. His mainresearch interest is computer graphics, including computational manga, image-based rendering,natural phenomena modeling, and multimedia data compression. He received IEEE Transactionson Multimedia Prize Paper Award 2005 and Young Researcher Award 2004.

Author's personal copy

Page 21: Author's personal copy - Shenzhen Universitycsse.szu.edu.cn/staff/~hswu/MTA-Psh.pdf · 2015. 6. 30. · Author's personal copy. 530 Multimed Tools Appl (2013) 67:529–547 as its

Multimed Tools Appl (2013) 67:529–547 547

Pheng-Ann Heng received the B.Sc. degree from the National University of Singapore in 1985, andthe M.Sc. degree in computer science, the M.A. degree in applied mathematics, and the PhD degreein computer science, all from Indiana University, Bloomington, in 1987, 1988, and 1992, respectively.Currently, he is a Professor in the Department of Computer Science and Engineering, The ChineseUniversity of Hong Kong (CUHK), Shatin. He has served as the Director of Virtual Reality,Visualization and Imaging Research Centre at CUHK since 1999 and as the Director of Centre forHuman-Computer Interaction at Shenzhen Institute of Advanced Integration Technology, ChineseAcademy of Science/CUHK since 2006. He has been appointed as a visiting professor at the Instituteof Computing Technology, Chinese Academy of Sciences as well as a Cheung Kong Scholar ChairProfessor by Ministry of Education and University of Electronic Science and Technology of Chinasince 2007.

Author's personal copy