Color Space Conversion

12
Design and Implementation of Ecient Architectures for Color Space Conversion F. Bensaali and A. Amira School of Computer Science, Queen’s University of Belfast, University Road, BT7 1NN, Belfast,UK [f.bensaali, a.amira]@qub.ac.uk Abstract Color spa ce con ve rsi on is ve ry impor tan t in many ty pes of ima ge process ing applicati ons inc lud ing video compre ss ion. This ope rat ion consumes up to 40% of the en ti re processing powe r of a hi ghly optimised decoder. Therefore, techn iques which e- cien tly implemen t this conversion are desired. This paper presents four dierent scalable architect ures for ecient implementation of two such color space converters using an FPGA based system. Distributed arithmetic techni que and systo lic design have been expl oi te d to impl ement the propo sed structures on the Cel oxica RC100 0-PP FPGA development boar d. The implementat ion appr oaches exhibits bet ter perf ormances when compar ed wit h exi sti ng implementations. Keywords: Color space Conversion, Systolic ar- chitecture, Distributed arithmetic, FPGA. 1 In tr oduct ion Color is a visual sensation produced by the light in the visible region of the spect rum inc ident on the retina. Since the human visual system has three types of color photore ceptor cone cells, thre e components are necessary and sucient to describe a color [1]. Col or spa ces (also cal led color models or col or systems) is a method by which we can specify, create and visualise color. The re are many exist ing colo r spaces and most of them represent each color as a point in a three-dimensional coordinate system. Each color space is optimized for a well-dened application are a [2]. The thr ee most popul ar color models are RGB (us ed in computer grap hic s); YIQ, YUV and YCrCb (used in video systems); and CMYK (used in color printing). All of the color spaces can be derived from the RGB information supplied by devices such as cameras and scanners. Processing an image in the RGB color space, with a set of RGB values for each pixel is not the most ecie nt method. To speed up some processing steps man y broadcast, vid eo and ima ging standards use luminance and color dierence video signals, such as YCrCb, making a mechanism for converting between formats necessary. Several cores for RGB to YCrCb conversion can be found in the market, which have been designed for FPGA implementation, such as the cores pro posed by Amphi on Ltd [3], CAS T.I nc [4] and ALMA .Tech [5]. As part of an ongoing research project to develop a hardware accelerator for image and si gnal pro- cessing algorithms based on matrix computations at Queen’s University of Belfast [6, 7, 8, 9], This paper proposes the use of FPGA as a low cost accelerator for RGB YCrCb Color Space Converters (CSCs) usi ng Systol ic Architecture (SA ) and Dis tri buted Ari thmeti c (DA) appro ac hes . For the second ap- proach, two architectures based on serial and parallel manipulation of pixels have been proposed. The target hardware for the implementation and veri cation of the propose d architec tures is Celo x- ica RC1000-PP PCI based FPGA development board equipped wit h a Xilinx XCV20 00E Virtex FPGA [10, 11]. The composi tion of the rest of the paper is as follows. A review for the conversion from R’G’B’ to Y’CrCb is given in section 2. Sections 3 and 4 are con- cerned with the mathematical backgrounds and the descriptions of the proposed architectures based SA and DA tec hnique s respectively. Then the hardware implementations with resul ts and analys is are then prese nted in Section 5. Finall y concludi ng remarks are given in section 6. 37 ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Transcript of Color Space Conversion

Page 1: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 1/11

Design and Implementation of Efficient Architecturesfor Color Space Conversion

F. Bensaali and A. AmiraSchool of Computer Science, Queen’s University of Belfast,

University Road, BT7 1NN, Belfast,UK[f.bensaali, a.amira]@qub.ac.uk

Abstract

Color space conversion is very important in manytypes of image processing applications includingvideo compression. This operation consumes upto 40% of the entire processing power of a highlyoptimised decoder. Therefore, techniques which effi-ciently implement this conversion are desired. Thispaper presents four different scalable architecturesfor efficient implementation of two such color spaceconverters using an FPGA based system. Distributedarithmetic technique and systolic design have beenexploited to implement the proposed structures

on the Celoxica RC1000-PP FPGA developmentboard. The implementation approaches exhibitsbetter performances when compared with existingimplementations.

Keywords: Color space Conversion, Systolic ar-chitecture, Distributed arithmetic, FPGA.

1 Introduction

Color is a visual sensation produced by the light inthe visible region of the spectrum incident on theretina. Since the human visual system has three typesof color photoreceptor cone cells, three componentsare necessary and sufficient to describe a color [1].

Color spaces (also called color models or colorsystems) is a method by which we can specify, createand visualise color. There are many existing colorspaces and most of them represent each color as apoint in a three-dimensional coordinate system. Eachcolor space is optimized for a well-defined applicationarea [2]. The three most popular color models areRGB (used in computer graphics); YIQ, YUV and

YCrCb (used in video systems); and CMYK (used incolor printing). All of the color spaces can be derivedfrom the RGB information supplied by devices such

as cameras and scanners.

Processing an image in the RGB color space, witha set of RGB values for each pixel is not the mostefficient method. To speed up some processing stepsmany broadcast, video and imaging standards useluminance and color difference video signals, such asYCrCb, making a mechanism for converting betweenformats necessary. Several cores for RGB to YCrCbconversion can be found in the market, which havebeen designed for FPGA implementation, such as thecores proposed by Amphion Ltd [3], CAST.Inc [4]and ALMA .Tech [5].

As part of an ongoing research project to developa hardware accelerator for image and signal pro-cessing algorithms based on matrix computations atQueen’s University of Belfast [6, 7, 8, 9], This paperproposes the use of FPGA as a low cost acceleratorfor RGB ↔ YCrCb Color Space Converters (CSCs)using Systolic Architecture (SA) and DistributedArithmetic (DA) approaches. For the second ap-proach, two architectures based on serial and parallelmanipulation of pixels have been proposed.

The target hardware for the implementation andverification of the proposed architectures is Celox-ica RC1000-PP PCI based FPGA development boardequipped with a Xilinx XCV2000E Virtex FPGA[10, 11]. The composition of the rest of the paper is asfollows. A review for the conversion from R’G’B’ toY’CrCb is given in section 2. Sections 3 and 4 are con-cerned with the mathematical backgrounds and thedescriptions of the proposed architectures based SAand DA techniques respectively. Then the hardwareimplementations with results and analysis are thenpresented in Section 5. Finally concluding remarksare given in section 6.

37

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 2: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 2/11

2 Color Space Conversion: A

Review

As mentioned in the introduction, many color modelshave been proposed, each oriented towards supportinga specific task or solving a particular problem. De-scribed below are the two color systems selected forour study which are used in many image processingapplications.

2.1 RGB Color Space

RGB color space is a simple and robust color defini-tion. RGB uses three numerical components to rep-resent a color. This color space can be thought of as a three-dimensional coordinate system whose axescorrespond to the three components, R or Red, G orGreen, and B or Blue. RGB is the color space that

computer displays use. It corresponds most closely tothe behavior of the human eye [1]. RGB is an addi-tive color system. The three primary colors red, green,and blue are added to form the desired color. For atrue color image, the red, green, and blue componentsof a pixel are each with eight bits width. In total, itmay have sixteen million (224) possible colors. Eachcomponent has a range of 0 to 255, with all three 0sproducing black and all three 255s producing white[1]. In the rest of this paper, the gamma-correctedRGB values are noted R’G’B’.

2.2 Y’CrCb Color SpaceY’CrCb is a scaled and offset version of the YUV colorspace where Y represents luminance (or brightness),U represents color, and V represents the saturationvalue. In this color space R’G’B’ is separated into aluminance part (Y’) and two chrominance parts (Cband Cr). Y’ is defined to have a range of 16 to 235,Cb and Cr have a range of 16 to 240 [1].

2.3 Converting From R’G’B’ to

Y’CrCb

Decomposing an R’G’B’ color image into one lu-minance image and two chrominance images is themethod that has been used in most commercial appli-cations such as face detection [12, 13] , as well as theJPEG and MPEG imaging standards [14, 15, 16].The suitability of the Y’CrCb color space for thesekind of applications is due to:

• The non correlation among the spaces of Y’CrCb, so each space can be analysed sepa-rately.

• Human eyes are more sensitive to the change of 

brightness than of color, so Cr and Cb spacescan be compressed more heavily than Y’ spaceto get better compression ratio.

The calculation of Y’CrCb color components fromR’G’B’ components consumes up to 40% of the pro-cessing power in a highly optimised decoder [14]. Ac-celerating this operation would be useful for the ac-celeration of the whole process. A color in the R’G’B’color space is converted to the Y’CrCb color space

using the following equation:

Y =0.257R + 0.504G + 0.098B + 16Cr=0.439R +−0.368G +−0.071B + 128Cb=−0.148R +−0.291G + 0.439B + 128

(1)

While the inverse conversion can be carried outusing the following equation:

R

=1.164Y 

+ 1.596Cr +−222.912G=1.164Y  +−0.813Cr +−0.392Cb + 135.616B=1.164Y  + 2.017Cb +−276.8

(2)

Figure 1 shows the direct mapping of the equa-tions 1 and 2 .

X

X

X

X

X

X

X

X

X

+

+

+

R’ / Y’ G’ / Cb B’ / Cr  16 / -222.912

128 / 135.616

128 / -276.8

0.257 / 1.164 0.504 / 0.0 0.098 / 1.596

-0.148 / 1.164 -0.29 1 / -0.392 0.439 / -0 .813

0.439 / 1.164 -0.368 / 2.017 -0.071 / 0.0

round

round

round

 Y’ / R’

Cb / G’

Cr / B’

Figure 1: General Block Diagram for R’G’B’ ↔Y’CrCb CSC

3 Proposed CSC based SA

A SA represents a network of PEs that rhythmicallycompute and pass data through the system. Themain features of systolic systems are modularity andregularity, which are important in FPGA implemen-tations [7]. In this section two architectures based onbit parallel SA approach for CSC implementation aredescribed.

The CSC core implements the following mathe-matical formula to convert from one space to another:

38

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 3: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 3/11

C 0

C 1C 2

=

A00 A01 A02 A03

A10 A11 A12 A13

A20 A21 A22 A23

×

B0

B1

B2

1

(3)

Where C i (0 ≤ i ≤ 2) and Bi (0 ≤ i ≤ 3) representthe input and output color components respectively.

Equation 3 can be mapped into the two proposedarchitectures as shown in Figures 2 and 3.

C0

C1

C2

PE00

PE02

PE01

PE12PE11

PE20

PE22

PE21

b0 B

1B

2

A00

A01

A02

A10 A

11A

12

A20

A21

A22

PE10

PE03

PE13

PE23

B3

A03

A13

A23

Delay

SE: Storage Element

PE structure

Cout

>> f 

Logical Shift Right

Signed Integer 

Multiplier  Signed Integer 

Adder A

ij

Bi S E 

 S E 

 S E 

Cin

Figure 2: Proposed systolic architecture (1)

PE0

PE1

PE2

B3

B2

B1

A23

A13

A03

A22

A12

A02

A21

A11

A01

A20

A10

A00

PE3

B0

C2

C1

C0

Figure 3: Proposed systolic architecture (2)

Since the matrix A coefficients are real numbers,floating-point or fixed-point representations can be

used to perform the multiplication. If the rangeof real numbers values that must be represented issmall or can be scaled in order to make it smaller,

fixed-point arithmetic is one way of providing cheapfast non-integer support. Fixed-point arithmetic isappropriate for our application because, as it can beseen from equations 1 and 2, the range of the valuesis small.

The first architecture consists of twelve identicalPEs (the number of PEs is equal to N ×M , where N and is M  are the number of rows and columns of thematrix A respectively). Each PE comprises a parallelfixed-point Multiply ACcumulator (MAC), a set of Storage Elements (SEs) where the coefficients Aik

and Bk are stored and another storage element forpipelining the partial products. The MAC containsa parallel signed integer multiplier, a parallel signedinteger adder and a right shifter which has the roleof shifting the multiplier output by the number of bits used for the fractional part representation of the

color components. The inputs data elements Aik arefed in a parallel fashion while the vector elements Bk

are fed in a parallel fashion and remain fixed in theircorresponding PE cell during the entire computationof the operation. Because of the values range of theR’G’B’ and Y’CrCb components, the inputs elementsare presented with 13 bits (8 bits for integer part and5 bits for fractional part).

The second architecture consists of four identicalPEs; each PE has the same structure as the PEs usedin the first architecture. The two architectures differin the throughput and the area required for each one.It is worth noting that using the first architecture, theentire computation can be carried out after M  clockcycles and requires N ×M  PEs, while using the secondarchitecture the entire computation can be carriedout after 2×(M −1) clock cycles and requires M  PEs.

Table 1 illustrates the performances obtained bythe two proposed architectures.

In our case the throughput rate has been definedas the reciprocal of the time between successiveoutputs vector. It can be seen from the table thatarchitecture (1) delivers data at a higher throughput

rate when compared with architecture (2).

The two proposed architectures (1) and (2) can beused for applications requiring matrix-vector product,such as in 3D affine transformations [8].

4 Proposed CSC Based DA

Since color space conversion can be expressed as aMatrix-Vector (MV) multiplication, two algorithmsbased DA are presented in this section.

DA distributes arithmetic operations rather thangrouping them as multipliers do. Conventional DA,called ROM-based DA, decomposes the variable input

39

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 4: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 4/11

Table 1: Architectures Performances

Architecture Computation time Area complexityThroughput rate

(vector/clock cycle)Proposed (1) (M )T O(N ×M ) 1Proposed (2) 2(M − 1)T O(M ) 1/N 

of the inner product to bit level in order to generateprecomputed data. ROM-based DA uses a ROMtable to store the precomputed data, which makes itregular and efficient in the use of the silicon area, in aVLSI implementation. The advantage of a DA-basedROM approach is its efficiency of implementation.The basic operations required are a sequence of ROMs, addition, subtraction and shift operationsof the input data sequence [17]. Examples for theuse of DA can be found in these references [17, 18, 19].

4.1 Proposed Architecture Based Se-

rial Manipulation Approach

4.1.1 Mathematical Background

Consider the matrix-vector product given by the fol-lowing equation:

C i =

N −1k=0

Aik ×Bk (4)

Where {Aik}’s are L-bits constants and {Bk}’sare written in the unsigned binary representation asshown in equation 5:

Bk =

W −1m=0

bk,m × 2m (5)

Where bk,l is the mth bit of  Bk , which is zero orone, W  is the word-length used which represents theresolution for each color component of a pixel.

Substituting 5 in 4,

C i =N −1k=0

Aik × (W −1m=0

bk,m × 2m) (6)

=W −1m=0

(N −1k=0

Aik × (bk,m × 2m)

Define:

Z m =

N −1k=0

Aik × bk,m (7)

Therefore, C i can be computed as:

C i =

W −1m=0

Z m × 2m (8)

The idea is that since the term Z m depends onthe bk,m values and has only 2N  possible values, it is

possible to precompute and store them in ROMs. Aninput set of  N  bits (b0,m, b1,m, . . . b(N −1),m) is used asan address to retrieve the corresponding Z m values.The ROM’s content is different and depends on theconstant matrix A coefficients. These intermediateresults are accumulated in W  clock cycles to produceC i coefficients.

4.2 Case Study: Converting From

R’G’B’↔ Y’CrCb

Since all the components are in the range of 0 to 255, 8bits are enough to represent them. In our application(N  = 4 and W  = 8), C i can be computed as:

C i =7

m=0

Z m × 2m (9)

Where:

Z m =3

k=0

Aik × bk,m (10)

3 ROMs (one for each matrix A row) with the sizeof  2N  = 24 = 16 are needed in order to store theprecompute 24 possible partial products values. Sincethe last element of the vector B is equal to 1:

b3,m =

1 for m = 00 for m = 0

(11)

Equation 10 can be rewritten as:

C i =7

m=0

Z ∗l × 2m + Ai3 (12)

40

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 5: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 5/11

Where:

Z ∗m =2

k=0

Aik × bk,m (13)

It is worth mentioning that the size of the ROMshas been reduced to 23. Table 2 gives the content of each ROM.

Table 2: Content of the ROM i (0 ≤ i ≤ 2)

b0,m b1,m b2,mThe Contentof the ROM i

0 0 0 00 0 1 Ai2

0 1 0 Ai1

0 1 1 Ai1 + Ai2

1 0 0 Ai0

1 0 1 Ai0 + Ai2

1 1 0 Ai0 + Ai1

1 1 1 Ai0 + Ai1 + Ai2

4.2.1 Proposed Architecture

Since our objective is to implement a core whichperforms two different color conversions (R’G’B’↔Y’CrCb), 6 ROMS are needed (3 for each conversion).Figures 4 and 5 show the proposed core pins and its

internal architecture respectively.

CSC

B1

B2

C0[0:7]

C1[0:7]

C2[0:7]

B0

S

Figure 4: Symbol of the CSC Core

The pins description is given in table 3.

Table 3: Pins Description

Name Dir DescriptionB0 I First input color space componentB1 I Second input color space componentB2 I Third input color space componentC 0 O First output color space component

C 1 O Second output color space componentC 2 O Third output color space componentS I Color space conversion type selection

<< mC

0

C1

<< m +

+C

2

3 ROMs

Block

(RGB

to

 YCrCb)

3 ROMs

Block

(YCrCb

to

RGB)

b2,m

b1,m

b0,m

S

CE 

CE 

PE

+

+

<< m +

+

Figure 5: Serial CSC based DA Architecture

The proposed architecture consists of three iden-tical Processing Elements (P Es) and two memoryblocks. Each P E  comprises a parallel ACCumulator(ACC) and a right shifter and each memory blockconsists of three ROMs with the size of  23 each(see Figure 6). The ROM’s content is different anddepends on the matrix A coefficients, which dependon the conversion type.

ROM1

ROM2

ROM3

b2,m

b1,m

b0,m

P0

P1

P2

Figure 6: Memory Block Structure

It is worth mentioning that our architecture is scal-able, however it can be used to perform n conversionsby adding every time 3 × n ROMs in order to storethe matrix conversion coefficients and keeping alwaysthe same P Es. An N × M  image can be converted

using the proposed architecture by setting the inputsevery 8 clock cycles using the R’G’B’ components of a new pixel (Y’CrCb for the inverse conversion).

41

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 6: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 6/11

4.3 Proposed Architecture Based Par-

allel Manipulation Approach

4.3.1 Mathematical Background

Consider an N  × M  image (Figure 7)(N : image

height, M : image width).

Let represent each image pixel by bijk (0 ≤ i ≤N − 1, 0 ≤  j ≤ M − 1, 0 ≤ k ≤ 2), where:

bij0 = R

ij

the red component of thepixel in row i and column j

bij1 = G

ij

the green component of thepixel in row i and column j

bij2 = B

ij

the blue component of thepixel in row i and column j

(14)

The image can be converted using the followingmathematical formula:

c000

c001

c002

...

c0(M−1)0

c0(M−1)1

c0(M−1)2

c100

c101

c102

...

c1(M−1)0

c1(M−1)1

c1(M−1)2

... ... ... c(N−1)00

c(N−1)01

c(N−1)02

...

c(N−1)(M−1)0

c(N−1)(M−1)1

c(N−1)(M−1)2

=

A00 A01 A02 A03

A10 A11 A12 A13

A20 A21 A22 A23

b000

b001

b002

1

...

b0(M−1)0

b0(M−1)1

b0(M−1)2

1

b100

b101

b102

1

...

b1(M−1)0

b1(M−1)1

b1(M−1)2

1

... ...

...

b(N−1)00

b(N−1)01

b(N−1)02

1

...

b(N−1)(M−1)0

b(N−1)(M−1)1

b(N−1)(M−1)2

1

(15)

Where the operation ⊗ can be defined as follows:

Each vector

cij0

cij1cij2

is the result of the product

A00 A01 A02 A03

A10 A11 A12 A13

A20 A21 A22 A23

×

bij0bij1bij2

1

, where cijk

represent the output image color space components

and A = A00 A01 A02 A03

A10 A11 A12 A13

A20 A21 a22 A23

represents one

of the constant matrices in equations 1 and 2.

The cijk elements (the output image color spacecomponents) can be computed using the followingequation:

cijk =3

m=0

Akm × bijm (16)

Where {Akm}’s are L-bits constants and {bijm}’sare written in the unsigned binary representation asshown in equation 17:

bijm =W −1l=0

bijm,l × 2l (0 ≤ m ≤ 2) (17)

Using the same development in the previous sec-tion, equation 16 can be rewritten as:

cijk =7

l=0

Z ∗l × 2l + Ak3 (18)

Where:

Z ∗l =2

m=0

Akm × bijm,l (19)

Likewise the first proposed architecture, TheROM’s content is different and depends on the ma-trix A coefficients, which depend on the conversion

type.

4.3.2 Proposed Architecture

Equation 17 can be mapped into the proposedarchitecture as shown in Figure 8.

The architecture consists of 8 identical P E ns (0 ≤n ≤ 7). Each P E n comprises three parallel signedinteger adders, three n right shifters and one ROMsblock, which have the structure as shown in figure 6.It is worth noting that the architecture has a latencyof  W  and a throughput rate equal to 1. The entire

image conversion can be carried out in (Latency +(N × M )Throughput) = 8 + (N × M ) clock cycles,while using the standard algorithm (Figure 9), the

42

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 7: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 7/11

Page 8: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 8/11

Handel-C code(FPGA Hardware)

FPGA bitstream

(full configuration)

FPGA bitstream

(partial configuration)

Celocixa DK2

IDE

Xilinx Layout

Tools

Xilinx JBits

External Cores

(Schematic, VHDL ,CoreGen ...)

EDIF

FPGA

 place&route

System-level model

C code(host processor)

Host processor 

program

C Compiler (MS Visual C++)

Simulation

HW/SW 

 partitioning 

Host Processor platform

FPGA BoardReal-time

 prototyping 

Prototyping Platform

FPGAconfiguration

Figure 11: Handel-C design flow

banks. All are accessible by the FPGA and any deviceon the PCI bus in parallel [10]. A schematic blockdiagram of RC1000-PP board is shown in Figure 12.

Bank0

Bank1

Bank2

Bank3PCI

XCV2000E

DMA

Control

Status8 Bit

Figure 12: RC1000-PP block diagram

5.1 CSC Based SA

Since the vector last element B3 is equal to 1, thenumber of PEs in the two architectures shown infigures 2 and 3 can be reduced. Figures 13 and14 show the modified architectures. It is worthmentioning that using the first architecture, theentire computation can be carried out after (M − 1)clock cycles and requires N  × (M  − 1) PEs, whileusing the second architecture the entire computationcan be carried out after 2 × (M − 1) − 1 clock cyclesand requires (M − 1) PEs.

During the conversion between (R’G’B’ ↔Y’CrCb), the outputs are rounded. Rounding usu-ally looks at the decimal value and if it is greaterthan or equal to 0.5, then the result is increased byone. This implies a condition to verify and anotheraddition operation. A more efficient way to round a

number is to add 0.5 to the result and truncate thedecimal value. This technique has been applied inour implementation. The initial value for each parallel

PE00

PE02

PE01

PE12PE11

PE20

PE22

PE21

B0 B1 B2

A00

A01

A02

A10 A

11A

12

A03 + 0.5

A20

A21

A22

A23

+ 0.5

C0

C1

C2

PE10A

13+ 0.5

Figure 13: Modified systolic architecture (1)

PE0

PE1

PE2

B0

B1

B2

A20

A12

A02

A21

A11

A01

A20

A10

A00

A23

+ 0.5

A13

+ 0.5

A03 + 0.5

C2

C1

C0

Figure 14: Modified systolic architecture (2)

adder in the three first PEs is set to (Ai3 + 0.5), where(0 ≤ i ≤ 2). The parallel signed adders and multi-pliers have been implemented using Xilinx’s CoreGenutility, which contains many designs that can oftensave time for a programmer and it is possible to in-tegrate CoreGen blocks with a program in Handel-Cusing the interface declaration [22].

5.2 CSC Based DA

This section describes the hardware implementationof the CSCs based DA principles. The ROMs havebeen implemented using the FPGA configurable LogicBlocks (CLBs) LUTs, which have some interestingcapabilities that allow creating very fast and efficientdesigns such as the RAM and ROM capability [23].Tables 4 and 5 give the content of the ROMs used forR’G’B’ to Y’CrCb and Y’CrCb to R’G’B’conversionsfor both architectures, respectively.

The second proposed architecture can be used forthe inverse conversion (Y’CrCb to R’G’B’) by:

44

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 9: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 9/11

Table 4: The Content of the ROMs (R’G’B’ to Y’CrCb)

R

m/R

ij0,l G

m/G

ij1,l B

m/B

ij2,l ROM1 ROM2 ROM3

0 0 0 0 0 00 0 1 0.098 -0.071 0.439

0 1 0 0.504 -0.368 -0.2910 1 1 0.602 -0.439 0.1481 0 0 0.257 0.439 -0.1481 0 1 0.355 0.368 0.2911 1 0 0.761 0.071 -0.4391 1 1 0.859 0 0

Table 5: The Content of the ROMs (Y’CrCb to R’G’B’)

Y

m/Y

ij0,l Crm/Crij1,l Cbm/Cbij2,l ROM1 ROM2 ROM3

0 0 0 0 0 00 0 1 0 -0.392 00 1 0 1.596 -0.813 1.5960 1 1 1.596 -1.025 1.5961 0 0 1.164 1.164 1.1641 0 1 1.164 0.772 1.1641 1 0 2.76 0.351 2.761 1 1 2.76 -0.041 2.76

• Duplicating the ROMS using the same imple-mentation approach used for the first architec-ture(with a selector signal which allows the userto choose the appropriate converter); or

• Setting the contents of the ROMs in advance,depending on the desired conversion.

The precomputed partial products are stored inthe ROMs using 13 bits fixed point representation (8bits for integer part and 5 bits for fractional part).13-bit arithmetic is used inside the architecture.The inputs and outputs of the two architectures arepresented using 8 bits and the outputs are rounded.Likewise the CSCs based SA implementation, thesame rounding technique is applied here. The initialvalue for each accumulator ACC i is set in advance to(Ai3 + 0.5), where (0 ≤ i ≤ 2).

The MACs and parallel signed adders have beenimplemented using Xilinx’s CoreGen utility [22].The shifters and ROMs initialisation have beenimplemented using VHDL. All design componentshave been connected together using Handel-C.

In order to make a fair and consistent comparisonwith the existing FPGA based color space converters,the XCV50E-8 FPGA device has been targeted.Table 6 illustrates the performances obtained for theproposed architecture in terms of area consumed and

speed which can be achieved.

The proposed DA architectures based serial and

parallel manipulation approaches show significantimprovements in comparison with the existing im-plementations [3, 4, 5], which perform the R’G’B’ toY’CrCb conversion, in terms of the area consumed

and the maximum running clock frequency. Theadvantage of the two other proposed architectures isthat they can be used for any color space conversionbased on the equation 3.

Table 7 illustrates the hardware/software imple-mentations comparison in terms of the RMS error-due to the use of difference data representation inthe two implementations- (RM S Error = 

1/(N ×M )N −1

i=0

M −1j=0 (I soft(i, j)− I hard(i, j))2)

and the computation time, when using the secondproposed DA architecture.

Table 7 shows the test results for two differentimages (Baboon image (512× 512) and Pepper image(256× 256) ). It can be seen that the same convertedimage can be obtained fastly when using the FPGAimplementation, with a minimum error (due to theuse of difference data representation in the two imple-mentations).

6 Conclusion

Processing an image in the RGB color space, with a

set of RGB values for each pixel is not the most ef-ficient method. To speed up some processing stepsmany broadcast, video and imaging standards use

45

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 10: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 10/11

Table 6: Performance comparison with existing CSC cores

Design Parameters Slices Speed (MHz)Proposed SA architecture (1) 305 68Proposed SA architecture (2) 1022 72

Proposed DA architecture (1) 70 128Proposed DA architecture (2) 193 234CAST.Inc [4] 222 112

ALMA. Tech [5] 222 105Amphion Ltd [3] 204 90

Table 7: Software/ hardware implementations for RGB to YCrCb CSC comparisons

OriginalImage

Softwareimplemen-

tation

Hardwareimplemen-

tation

RMS Error Computationtime (ms)

Software Hardware

Y 0.487Cr 0.630Cb 0.461

126 1.2

Y 0.684Cr 0.830Cb 0.396

43 0.28

luminance and color difference video signals, suchas YCrCb, making a mechanism for converting be-tween formats necessary. In this paper novel scal-able architectures based on DA and SA approaches forRGB ↔ Y CrCb conversions, which require enor-mous computing power, have been reported. The im-plementation result shows the effectiveness of the DAapproach. The performance in terms of the area usedand the maximum running frequency of the proposedarchitecture has been assessed and has shown thatthe proposed system requires less area and can be runwith a higher frequency when compared with existingsystems. The proposed systolic structures can per-form other conversions based on matrix-vector multi-plication, while the DA structure can be used for otherconversions by modifying the content of the ROMs.

References

[1] B. Payette, “Color Space Converter: R’G’B’to Y’CrCb,”  Xilinx Aplication Note, XAPP637,V1.0, September 2002.

[2] R.C. Gonzalez and R.E. Woods, “Digital ImageProcessing,” Second Edition, Printice Hall Inc,2002.

[3] Datasheet (www.amphion.com), “Color SpaceConverters,”  Amphion semiconductor Ltd,DS6400 V1.1, April 2002.

[4] Application Note (www.cast-inc.com), “CSCColor Space Converter,”  CAST Inc, April 2002.

[5] Datasheet (www.alma-tech.com), “High Perfor-mance Color Space Converter,”  ALMA Technolo-gies, May 2002.

[6] F. Bensaali and A. Amira, “Design and Efficient

FPGA Implementation of an RGB to YCrCbColor Space Converter Using Distributed Arith-metic,”  Proceedings of the International Confer-ence on Field Programmable Logic (FPL), Lec-ture Notes in Computer Science, to be published by Springer Verlag, August, 2004.

[7] A. Amira, “A custom Coprocessor for MatrixAlgorithm,”  PhD thesis, Queen’s University of Belfast, 2001.

[8] F. Bensaali, A. Amira, I.S. Uzun and A. Ahmed-said, “An FPGA Implementation of 3D Affine

Transformations,”  The 10th IEEE International Conference on Electronics, Circuits and Systems (ICECS’03), Sharjah, UAE, December, 2003.

46

ICGST-GVIP Journal, Volume 5, Issue1, December 2004

Page 11: Color Space Conversion

8/6/2019 Color Space Conversion

http://slidepdf.com/reader/full/color-space-conversion 11/11

[9] F. Bensaali, A. Amira, I.S. Uzun and A. Ahmed-said, “Efficient Implementation of Large Paral-lel Matrix Product for DOTs,”  The International Conference on Computer, Communication and Control Technologies (CCCT’03), Florida, USA,July, 2003.

[10] Datasheet, (www.celoxica.com)“RC1000 Recon-figurable hardware development platform,”  Ce-locixa Ltd.,2001.

[11] URL: www.xilinx.com

[12] A. Albiol, L. Torres and E.J. Delp, “An unsuper-vised color image segmentation algorithm for facedetection applications,”  In Proceedings of the In-ternational Conference on Image Processing, pp681-684, Vol. 2, October 2001.

[13] P. Kuchi, P. Gabbur, P.S. Bhat and S. David,

 “Human Face Detection and Tracking using SkinColor Modelling and Connected Component Op-erators,”  The IETE Journal of Research, Special issue on Visual Media Processing, May 2002.

[14] M. Bartkowiak, “Optimisations of Color Trans-formation for Real Time Video Decoding,”  Dig-ital Signal Processing for Multimedia Communi-cations and Services, EURASIP ECMCS 2001,Budapest, September 2001.

[15] J.L. Mitchell and W.B. Pennebaker, “MPEGVideo Compression Standard,” Chapman & Hall,

1996.

[16] J. Bracamonte, P. Standelmann, M. Ansorge andF. Pellandini, “A Multiplierless ImplementationScheme for the JPEG Image Coding Algorithm,” IEEE Nordic Signal Processing Symposium, Kol-marden, Sweden, June 13 - 15, 2000.

[17] A. Amira, “An FPGA Based Parameteris-able System For Discrete Hartley TransformsImplementation,”  Proceedings of The Interna-tional Conference on Image Processing (ICIP),Barcelona, Spain, September 2003.

[18] H. Ohlsson and L. Wanhammer, “Maximally fastnumerically equivalent state-space recursive digi-tal filters using distributed arithmetic,”  Proceed-ings of the IEEE Symposium in Nordic Signal Processing (NORSIG2000), Kolmarden, Sweden,pp 295-298, June 2000.

[19] O. Gustafsson and L. Wanhammar, “Implemen-tation of a Digital Beamformer in an FPGA us-ing Distributed Arrithmetic,”  Proceedings of the IEEE Symposium in Nordic Signal Processing (NORSIG2000), Kolmarden, Sweden, pp 295-298, June 2000.

[20] Manual, (www.celoxica.com)“Handel-C Lan-guage Reference Manual,”  Celocixa Ltd.,2003.

[21] URL: www.celoxica.com

[22] Application Note, “Xilinx CoreGen and Handel-C,” AN 58 v1.0, 2001.

[23] M. Defossez, “Using the Virtex Look-Up Tables,” Xilinx Application Note (www.xilinx.com).

47

ICGST-GVIP Journal, Volume 5, Issue1, December 2004