An overview of quantization in JPEG 2000 · 2007-02-11 · JPEG 2000 employs a dead-zone uniform...

Signal Processing: Image Communication 17 (2002) 73–84

An overview of quantization in JPEG 2000

Michael W. Marcellina,*, Margaret A. Lepleyb, Ali Bilgina, Thomas J. Flohrc,Troy T. Chinend, James H. Kasnere

aDepartment of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USAbThe MITRE Corporation, Bedford, MA, USA

cScience Applications International Corporation, Tucson, AZ, USAdFuji Film Software Inc, Sunnyvale, CA, USAeAerospace Corporation, Chantilly, VA, USA

Abstract

Quantization is instrumental in enabling the rich feature set of JPEG 2000. Several quantization options are providedwithin JPEG 2000. Part I of the standard includes only uniform scalar dead-zone quantization, while Part II allowsboth generalized uniform scalar dead-zone quantization and trellis coded quantization (TCQ). In this paper, an

overview of these quantization methods is provided. Issues that arise when each of these methods are employed arediscussed as well. r 2002 Elsevier Science B.V. All rights reserved.

Keywords: JPEG 2000; Uniform quantization; Dead-zone quantization; Trellis coded quantization (TCQ)

1. Introduction

JPEG 2000 is the latest ISO/IEC image com-pression standard. It is distinct from previousstandards, in that it creates a framework where theimage compression system can act like an imageprocessing system rather than just a simple input–output storage filter. The decision on several keycompression parameters such as quality or resolu-tion can be delayed until after the creation of

the compressed codestream. JPEG 2000 enablesdecompression of many image products from asingle compressed file and offers several opportu-nities for compressed domain processing.

Quantization is one of the critical ingredients ofthe JPEG 2000 image compression system. Manyof the desirable elements of the JPEG 2000 featureset are enabled due to careful selection ofquantization methods. In this paper, we providean overview of these methods.

This paper is organized as follows. In Section 2,an introduction to quantization methods used inJPEG 2000 is presented. Section 3 describes howthese quantization methods are used within theframework of JPEG 2000, and Section 4 providesa brief summary.

*Corresponding author.

E-mail addresses: [email protected] (M.W. Mar-

cellin), [email protected] (M.A. Lepley), [email protected] (A.

Bilgin), [email protected] (T.J. Flohr), TChinen@-

FujifilmSoft.com (T.T. Chinen), [email protected]

(J.H. Kasner).

0923-5965/02/$ - see front matter r 2002 Elsevier Science B.V. All rights reserved.

PII: S 0 9 2 3 - 5 9 6 5 ( 0 1 ) 0 0 0 2 7 - 3

2. Quantization

Quantization is the element of lossy compres-sion systems responsible for reducing the precisionof data in order to make them more compressible.JPEG 2000 offers several different quantizationoptions. Only uniform scalar (fixed-size) dead-zone quantization is included in Part I of thestandard. Part II of the standard generalizes thisquantization method to allow more flexible dead-zone selection. Furthermore, trellis coded quanti-zation (TCQ) is offered in Part II as a value-addedtechnology.

2.1. Scalar quantization

The simplest form of quantization is scalarquantization. JPEG 2000 employs a dead-zoneuniform scalar quantizer to coefficients resultingfrom the wavelet transform of image samples.Fig. 1 illustrates such a quantizer with stepsize D:A scalar quantizer (SQ) can be described as afunction Q that maps each element in a subset ofthe real line to a particular value. For a givenwavelet coefficient x; the quantizer produces asigned integer q given by

q ¼ QðxÞ: ð1Þ

The quantization index q indicates the interval inwhich x lies. In Fig. 1, the endpoints of thequantization intervals are indicated by the verticallines.

Given q; the decoder produces an estimate of xas

#x ¼ Q�1ðqÞ: ð2Þ

In Fig. 1, the heavy ‘‘dots’’ represent theseestimates (or reconstruction values). For a givenstep size D; q is computed as

q ¼ QðxÞ ¼ signðxÞjxjD

� �: ð3Þ

Notice that the wavelet coefficients inside theinterval (�D; D) are quantized to zero for thequantizer in Fig. 1. Thus, the interval (�D; D) iscalled the ‘‘deadzone’’. The width of this interval is2D; while all other intervals are of width D:

The inverse quantizer is given by

#x ¼ Q�1ðqÞ ¼0; q ¼ 0;

signðqÞðjqj þ dÞD; qa 0;

(ð4Þ

where d is a user selectable parameter within therange 0pdo1 (typically d ¼ 1=2). d can be chosento achieve the best objective or subjective qualityat reconstruction.

The quantizer illustrated in Fig. 1 is the quanti-zer used in JPEG 2000 Part I. However, in Part IIof the standard, it is possible to generalize thisquantizer to allow a deadzone of variable width,while maintaining the fixed width D for all otherintervals. For a deadzone of width 2ð1� nzÞD; thequantization index is given by

q ¼0; jxjo� nzD;

signðnÞ jxjþnzDD

� �; jxjX� nzD:

(ð5Þ

Here, nz is a parameter that adjusts the dead-zonesize, and is transmitted as side information to thedecoder. Notice that nz ¼ 0 corresponds to thedefault quantizer given in Fig. 1, and values of nzin the interval (0, 1) result in a smaller deadzone,while values in (�1, 0) result in a larger deadzone.The generalized dead-zone uniform scalar quanti-zer is illustrated in Fig. 2. In this case, thereconstructed wavelet coefficients are given as

#x ¼0; q ¼ 0;

signðqÞðjqj � nzþ dÞD; qa 0:

(ð6Þ

In JPEG 2000, there is some indication that usingnzE0:25 (i.e., dead-zone size about 1:5D) canprovide a very slight decrease in MSE and generatemore visually pleasing low-level texture recon-struction.

2.1.1. Embedded scalar quantizationA very desirable feature of compression systems

is the ability to successively refine the recon-structed data as the bit-stream is decoded. In thissituation, the decoder reconstructs an approxima-tion of the reconstructed data after decoding

0 3∆∆ 2∆−∆−2∆−3∆. . . . . .

x

−4∆ 4∆ 5∆−5∆

Fig. 1. Dead-zone uniform scalar quantizer with stepsize D:

M.W. Marcellin et al. / Signal Processing: Image Communication 17 (2002) 73–8474

a portion of the compressed bit-stream. As moreof the compressed bit-stream is decoded, thereconstruction quality can be improved, until thefull quality reconstruction is achieved upondecoding the entire bit-stream.

Since this property is one of the key focuses ofJPEG 2000, the quantizers used in the standardhave been designed to enable embedded quantiza-tion. Let us consider the uniform dead-zonequantizer defined by Eqs. (3) and (4). Thisquantizer has embedded within it, all uniformdead-zone quantizers with step sizes 2pD forinteger pX0:

To see this, let K ¼ maxq

log2ðjqjÞ� �

: That is, K is

the number of bits required to represent allquantization indices. Then, we can represent q insign magnitude form as

q ¼ s; q0q1q2? qK�1; ð7Þ

where s is the sign, q0 is the most significant bit(MSB), and qK�1 is the least significant bit (LSB)of q: Now, let

qðpÞ ¼ s; q0q1q2? qK�1�p ð8Þ

denote the index obtained by dropping the p leastsignificant bits of q: It is then easy to see that

qðpÞ ¼ QpðxÞ; ð9Þ

where Qp is the uniform dead-zone quantizer with

step size 2pD: Fig. 3 illustrates this embedding forp ¼ 1 and 2.

Eq. (9) suggests that if the p LSBs of jqj are notavailable at the decoder, it is still possible to obtainan approximation to x; but at a lower level ofquality. In this case, the inverse quantization isperformed using the dequantizer Q�1

p such that

#x ¼Q�1p ðqðpÞÞ

¼0; qðpÞ ¼ 0;

signðqðpÞÞðjqðpÞj þ dÞ2pD; qðpÞ a 0:

(ð10Þ

Note that p ¼ 0 yields the full quality dequantiza-tion given by Eq. (4).

Similar embedding properties exist for thegeneralized dead-zone uniform scalar quantizerdefined by Eqs. (5) and (6). In this case, when pLSBs of the index jqj are not available at thedecoder, the reconstructed value is computed as

#x ¼0; qðpÞ ¼ 0;

signðqðpÞÞðjqðpÞj � nz2�p þ dÞ2pD; qðpÞ a 0:

(

ð11Þ

Note that uniform dead-zone SQ has one im-portant embedding property that is not shared bythe generalized dead-zone scalar quantizer whennza0: For QpðxÞ; as illustrated in Fig. 3, anytruncation of p bits is exactly equivalent tochoosing a larger step size 2pD: As a consequence,the dead-zone width is always exactly twice theeffective step size of 2pD: So full reconstruction ofa quantization with a large step size will produceidentical results to a partial reconstruction of aquantization with a smaller step size. This allowsflexibility in setting the step size magnitude.

When nza0 the deadzone after truncating pLSBs is 2ð1� nz2�pÞ times the effective step size of2pD: Thus, the dead-zone size does not remainconstant throughout the embedding. Benefitsgained by setting nza0 will have fullest impactwhen coefficients are fully decoded, so step sizechoice becomes quite important.

∆(1-nz)∆-(1-nz)-(4-nz) ∆ ∆-(2-nz)∆-(3-nz) 0

x

. . . ∆(2-nz) ∆(3-nz) ∆(4-nz). . .

Fig. 2. Generalized dead-zone uniform scalar quantizer with stepsize D and dead-zone size 2ð1� nzÞD:

0 2∆−2∆. . . . . .

x

−4∆ 4∆

0. . . . . .

x

4∆

p=1

p=2

0 3∆∆ 2∆−∆−2∆−3∆ . . .

x

−4∆−5∆ 4∆ 5∆

p=0

−4∆

. . .

Fig. 3. Embedded dead-zone uniform scalar quantizer.

M.W. Marcellin et al. / Signal Processing: Image Communication 17 (2002) 73–84 75

2.1.2. Dequantization with reversible transformThe inverse quantizer formulas given in Eqs. (4)

and (10) have a problem when followed by areversible transform. There is no guarantee thatthe estimate #x is an integer. To avoid this problem,JPEG 2000 modifies the inverse quantization defi-nition in this special case to

#x ¼Q�1p ðqðpÞÞ

¼0; qðpÞ ¼ 0;

signðqðpÞÞ ðjqðpÞj þ dÞ2pD� �

; qðpÞ a 0:

(ð12Þ

2.2. Trellis coded quantization

Trellis coded quantization (TCQ) is based onthe ideas of an expanded signal set and setpartioning from coded modulation, and has beenshown to be an efficient method with modestcomplexity for encoding of memoryless sources [5].As mentioned previously, TCQ is included inJPEG 2000 Part II.

A trellis is nothing more than a state transitiondiagram (that takes time into account) for a finitestate machine. Trellises are used to study se-quences of state transitions, or equivalently,sequences of states. A typical trellis with 8 statesis shown in Fig. 4. In the figure, each column ofheavy dots represent the eight possible states atany given point in time. These states are labeledfrom 0 to 7, from top to bottom. Each branch in

the trellis represents a transition from one state toanother, at the next point in time. For example, inthe figure, two possible transitions from state 1 areto either state 0 or state 4. Specifying a paththrough the trellis is equivalent to specifying asequence of states. Given an initial state at t ¼ 0;this path can be specified by a binary sequence,since there are only two possible transitions fromone state to another in the figure.

For the variant of TCQ included in JPEG 2000,a uniform scalar quantizer is partitioned into foursubsets called D0; D1; D2 and D3: This isillustrated in Fig. 5. The subsets Di are used tolabel the branches of a trellis. Fig. 6 shows asingle stage of the 8-state trellis used in Fig. 4with branch labeling. The union of the quantizersassociated with each state is called a unionquantizer. Notice that the two union quantizersused for the trellis in Fig. 6 are A0 ¼ D0

SD2 and

A1 ¼ D1

SD3: These quantizers are illustrated in

Fig. 7. In Fig. 7, the reconstruction values #xcorresponding to each union quantizer and thecorresponding union quantizer indices qðAiÞ areshown as well.

The structure of the trellis allows encoding to bedone using the Viterbi Algorithm [2]. For encodinga data sequence xARm; an N-state trellis ofm stages is employed. Note that such a trellishas mþ 1 columns of states. Let Si;l ;i ¼ 0; 1;y; m; l ¼ 0; 1;y; N � 1 denote state lof column i:

0

1

2

4

6

7

State

Time

t=0 t=1 t=2 t=3

. . .3

5

Fig. 4. A typical trellis diagram.


For each state Siþ1;l let Si;l0 and Si;l00 be thetwo states having branches ending in Siþ1;l : Also,let Dl0;l and Dl00;l be the subsets associated withthose branches, respectively. Let cl0;l and cl00 ;l be the

codewords in Dl0 ;l and Dl00 ;l that minimize rðxi; cÞ ¼ðxi � cÞ2; and let dl0 ;l ¼ ðxi � cl0 ;lÞ

2 and dl00 ;l ¼ ðxi �cl00;lÞ

2: Finally, let siþ1;l be the ‘‘survivor distor-tion’’ associated with the survivor path at stateSiþ1;l :

The ith step (i ¼ 0;y; m� 1) in the Viterbialgorithm then consists of setting siþ1;l ¼min fsi;l0 þ dl0 ;l ; si;l00 þ dl00 ;lg; preserving the branchthat achieves this minimum, while deleting theother branch from the trellis. If two valuescompared for minimum survivor distortion areequal, the ‘‘tie’’ can be resolved arbitrarily with noimpact on MSE.

When the end of the data is reached (i ¼ m� 1),the trellis is traced back from the final state havingthe lowest survivor distortion, and the correspond-ing set of TCQ indices are produced. For long datasequences (mclog2 N) the choice of initial statehas negligible impact on MSE. Thus, we arbitra-rily fix the initial state at 0.

A simple modification can be applied to theTCQ quantization indices obtained as describedabove, to allow more efficient entropy coding. Ifwe assume that the input sequence xi has asymmetric probability distribution, it can be seenin Fig. 7 that the quantization indices 1 from A0

and �1 from A1 have the same probability.Similarly, the probability of the quantization index�1 from A0 is equal to that of quantization index 1from A1: Thus, we negate the 71 indices in A1 tobring the probabilities of 71 indices in both unionquantizers into agreement. This allows for moreefficient entropy coding of quantization indices.Note that, it is also possible to negate all thequantization indices in A1 [3]. This will bring theprobabilities of all the quantization indices in A0

and A1 into agreement, and will improve theperformance of entropy coding. However, this isavoided since the negation of the indices of A1

(except 71) compromises the embedding propertyof TCQ, as discussed in the next section.

The dequantization of TCQ indices at thedecoder is straightforward. The sequence ofindices specifying which codeword was chosenfrom the appropriate union quantizer at each stageis sufficient to allow the decoder to reproduce thereconstructed values, given the initial state. Thedequantization utilizes the same trellis employed at

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

x̂

x̂

x̂

x̂

0−4∆ 7∆−8∆

∆ 5∆−2∆ 9∆−6∆−10∆2

0 4∆−3∆ 8∆−7∆1

−∆ 2∆ 6∆ 10∆−5∆−9∆

3∆0

3D

D

D

D

Fig. 5. Scalar quantizers used for TCQ.

0

1

2

3

4

5

6

7

State

D0

D0

D2D2

D0D1

D3D3

D1D0

D2D2

D1

D3D3

D1

Fig. 6. Branch labeling for TCQ.

−∆−3∆−5∆−7∆−9∆

. . . 0 ∆ 3∆ 5∆−2∆−4∆ 7∆ 9∆−6∆−8∆. . . −10∆

x

0

51 2 3 40

−5 −4 −3 −2 −1 51 2 3 4

1

0 2∆ 4∆ 6∆ 8∆ 10∆. . .

x

0A

A

x̂

(A )

x̂

q

0q(A )

1−5 −3 −2−4 −1

Fig. 7. Union quantizers for TCQ.


the encoder, and initializes the state index l ¼ 0;and the stage index i ¼ 0: If the union quantizerused for the current state Si;l is A1 and thequantization index is 71, the quantization index isnegated. The reconstructed value is computedusing the union quantizer of the current state.The next state is chosen using the branch labelingof the trellis as follows: if the quantization index ofthe union quantizer is from subset Dj ; the nextstate is selected by following the branch labeled Dj :

2.2.1. Embedded trellis coded quantizationAs in the case of SQ, the sign magnitude

representation of TCQ indices can be employedto achieve an embedding for TCQ. To see this, firstnotice that codewords from the two subsets(within a union quantizer) differ in the leastsignificant bit of their index qðAiÞ: In other words,the least significant bit of the quantization indexdetermines the subset to which the codewordbelongs.

For example, in A0 all the codeword indices forthe codewords in D2 have ‘1’s in their leastsignificant bit, while the indices for codewords inD0 have ‘0’s. Since the decoder determines the nextstate by identifying the subset each codewordbelongs to, the least significant bit thus determinesthe path through the trellis. Since the decoderneeds to be able to determine the path, it is notpossible to invert the TCQ indices precisely untilall of the least significant bits are available.However, it is possible to form an approximatereconstruction value in the absence of the p leastsignificant bits of the TCQ quantization index.Similar to the case of SQ, this value can becomputed as

#x ¼0; qðpÞ ¼ 0;

signðqðpÞÞðjqðpÞj þ dÞ2pþ1D; qðpÞ a 0:

(ð13Þ

Note that this operation is equivalent to perform-ing inverse scalar quantization with twice the TCQstep size.

As mentioned in the previous section, negationof the indices of A1 will improve the performanceof entropy coding. However, since the paththrough the trellis cannot be determined until theleast significant bitplane becomes available at the

decoder, the decoder cannot identify the code-words belonging to A1: Thus, this negation willjeopardize the embedding property, and isavoided. Note, however, that the negation of71’s in A1 will not jeopardize embedding. The71’s are transmitted in the least significantbitplane and at this point the decoder candetermine the path through the trellis. This allowsidentification of 71’s that belong to A1: Thereader is referred to [1] for further discussion onembedded coding of TCQ indices.

3. Quantization in JPEG 2000

In JPEG 2000, each resolution of waveletcoefficients is partitioned into precincts. Precinctsare one of the ingredients to low memoryimplementations. They also provide a method ofspatial random access. Compressed data from aprecinct are grouped together to form a packet.

Another one of the geometric structures used inJPEG 2000 are codeblocks. Codeblocks are for-med by partitioning subbands. Since the precinctsize (resolution dependent) and codeblock size(resolution independent) are both powers of 2,the two partitions are forced to ‘‘line up’’. Thus, itis reasonable to view the codeblocks as partitionsof the precincts (rather than of the subbands).

In JPEG 2000, wavelet coefficients of eachcodeblock are scanned in a particular order forthe purpose of quantization and entropy coding.This order was chosen to facilitate efficientpipelining and/or parallel processing. Fig. 8 illus-trates the scan pattern used for a given codeblock.Each codeblock is quantized and entropy codedindependently.

4 co

effic

ient

s

Fig. 8. Scan order for quantization sequence of a codeblock.


Entropy coding is performed using context-dependent, binary, arithmetic coding of bitplanes(For details of entropy coding methods used inJPEG 2000, refer to [6]).

Consider a quantized codeblock to be an arrayof integers in sign-magnitude representation, thenconsider a sequence of binary arrays with one bitfrom each coefficient. The first such array containsthe most significant bit (MSB) of all the magni-tudes. The second array contains the next MSB ofall the magnitudes, continuing in this fashion untilthe final array which consists of the LSBs of all themagnitudes. These binary arrays are referred to asbitplanes.

The number of all-zero ‘‘leading’’ bitplanes forthe codeblock is signaled as side information andbitplanes are encoded starting from the firstbitplane having at least a single 1. Note that thisway of coding complements embedded quantiza-tion and facilitates successive refinement of wave-let coefficients. If K�p of the most significantbitplanes for a codeblock are available at thedecoder, the quantization index qðpÞ can be formedfor each wavelet coefficient of the codeblock. Theapproximate reconstruction values are then givenby Eq. (10).

The encoder may take advantage of thisembedding facility and choose not to encode p ofthe LSBs for a particular codeblock. When thisoccurs, the effective quantization step size of thiscodeblock, 2pD; is due to a combination of the stepsize D and encoder truncation. How a desiredeffective quantization D0 is generatedFeither all inthe quantization step size (D0 ¼ D) or some portionoccurring via bitstream truncation (D0 ¼ 2pDÞFistypically an encoder choice.

The ability to transfer much of the quantizationinto the bitstream truncation process is a powerfultool within JPEG 2000. Since JPEG 2000 code-stream construction allows truncation to becontrolled on a sub-bitplane level at each codeblock, coefficients in different areas of the imagecan have different effective quantization levels.

3.1. Determining amount of embedding

All the embedded inverse quantization formulasrely on a knowledge of p; the number of LSBs

unavailable at the decoder. In JPEG 2000 eachsubband b has a maximum number of magnitudebitplanes that can be decoded, Mb: By countingthe number of magnitude bits actually decoded foreach coefficient, Nbðu; vÞ; the decoder determines

p ¼Mb �Nbðu; uÞ; ð14Þ

where Mb itself is composed of the predictednumber of quantized bits, eb; and some number ofguard bits, G; to accomodate unpredicted over-flow, such that

Mb ¼ Gþ eb � 1: ð15Þ

Both G and eb are chosen by the encoder andsignaled in the codestream. A typical value for G is2. The values eb set by the encoder are a functionof both the quantization step size and thepredicted bitdepth of the wavelet coefficients,

Rb ¼ RIþlog ðgainbÞ; ð16Þ

where RI is the original image bitdepth and gainb

is the gain expected from the wavelet transform insubband b. The exact relationship between eb; thequantization step size, and Rb will becomeapparent in the next section.

3.2. Selection and signaling of quantization stepsizes

Image quality and compression rate are con-trolled by the amount of quantization applied toeach coefficient. JPEG 2000 is a decoder standardso the quantization step size selection is quiteflexible. However, there are a few restrictionsimposed by the standard.

3.2.1. Reversible waveletsWhen reversible wavelets are utilized in JPEG

2000, uniform dead-zone scalar quantization witha step size of D ¼ 1 must be used. Reversiblewavelets produce integer wavelet coefficients.Thus, a step size of D ¼ 1 results in no quantiza-tion. If all the bitplanes are encoded, losslessreconstruction is possible once all bitplanes arereceived by the decoder.

The constant step size D ¼ 1 is not signaled tothe decoder. Instead, the value eb ¼ Rb is signaled


in a 5-bit unsigned integer field. If Rb > 31; eb ¼ 31and lossless reconstruction is impossible.

Restricting the step size to 1 does not preventthe encoder from embedding the bitstream toallow intermediate (or even final) lossy results. Theeffective quantization step size caused by thisembedding can be any power of two.

3.2.2. Irreversible waveletsWhen irreversible wavelets are utilized in JPEG

2000, the step size selection is restricted only by thesignaling syntax itself.

Every subband b has a single quantization stepsize, Db; that is represented as a two-part quantity,(eb; mb) such that

Db ¼ 1þmb

211

2Rb�eb ; ð17Þ

where mb is an 11-bit unsigned integer and eb is a 5-bit unsigned integer.

This syntax has several consequences.* Only one quantization step size per subband.

Therefore, Db must be smaller than (or equal to)the effective quantization desired in differentareas in the subband. If in doubt, choose asomewhat small quantization step size and thentruncate later as required. This also means asingle scale factor (1þ mb=2

11) must be usedacross the entire subband.

* The accuracy of the step size is restricted to 12significant binary bits. This can impact conver-gence of integrative step size refinement techni-ques and transcoding of step sizes definedoutside JPEG 2000.

* Upper bound: Dbo2Rbþ1: A step size that isover twice the predicted subband magnitudewill quantize almost all subband coefficients tozero. The same effect can be achieved using areduced step size (set eb ¼ 0) and applyingencoder truncation.

* Lower bound: 2Rb�31pDb: This bound is morerestrictive than the upper bound, since it limitshigh accuracy coding. For 8-bit image data 21fractional bits of the HH coefficients can beencoded. This is certainly enough for manycompression needs. However, when the imagebitdepth becomes large, this bound has moreimpact. For example, the HH subband of 30-bit

image data must be truncated above theoriginal decimal point. When computationsare performed in 32-bit registers this does notimpact overall performance, but the lowerbound must be applied prior to using the stepsize for any intermediate computations.

The exponent/mantissa pairs (eb; mb) are eithersignaled in the codestream for every subband(expounded quantization), or else signaled only forthe lowpass subband (eLL; mLL) and derived for allother subbands (derived quantization).

Derived quantization can only be used when thestep size exponent/mantissa pairs obey

ðeb; mbÞ ¼ ðeLL � nLL þ nb;mLLÞ; ð18Þ

where nb is the number of levels of decompositionrequired to reach subband b; and the subscript LLindicates the lowpass subband. With the Mallat(or dyadic) decomposition used in JPEG 2000Part I, this means that the step size decreases by afactor of 2 at each decomposition level. This gives arough approximation of the step sizes which wouldbe chosen based on high rate theory using the L2

synthesis gains of the wavelet subbands.Average performance often improves if the

actual L2 synthesis gains are used [7] to set therelative step sizes. Specifically, if Wb is the L2

synthesis gain (or ‘‘energy weight’’) of band b, then

Db ¼

ffiffiffiffiffiffiffiW0

Wb

rD0: ð19Þ

When this methodology of step size selection isemployed, expounded quantization is usuallyrequired.

3.3. Lagrangian rate allocation

As discussed above, when expounded quantiza-tion is employed, quantization step sizes can beselected separately for each subband. Since theselection of the quantization step sizes is anencoder issue, a particular selection method isnot specified within the standard. One method thatcan be used to determine these step sizes isdescribed in [3]. In this procedure, every subbandof wavelet coefficients is assigned a generalized-Gaussian density (GGD) model. The GGDs are afamily of symmetric, unimodal probability density


functions (pdf). The zero-mean pdf is given by

pðxÞ ¼a

2sGð1=aÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiGð3=aÞGð1=aÞ

s

exp �

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiGð3=aÞGð1=aÞ

sjxjs

� � !a( ): ð20Þ

The a parameter determines the shape of thefunction, where a ¼ 1:0 corresponds to the Lapla-cian density, and a ¼ 2:0 corresponds to theGaussian density. In the procedure, five allowablechoices for a are 0.5, 0.75, 1.0, 1.5, 2.0. These valueshave been chosen to cover the extent of the observedpdfs of wavelet coefficients in typical imagery. Theparameter a for a given subband is estimated fromits sample kurtosis. Given ai and diðRiÞ; i ¼1;y; B; where B is the number of subbands anddiðRiÞ is the MSE obtained by quantizing at rate Ri

a unit variance GGD source having parameter aithe rate allocator attempts to minimize the overallMSE constrained by an average rate constraint. Fordetails of this Lagrangian rate allocation strategy,the interested reader is referred to [3].

3.4. Considerations for TCQ

For TCQ, the Db signaled in the codestream isactually 2D; i.e. twice the step size used in the TCQdescription. This allows decoders that have noknowledge of TCQ to decode embedded datacorrectly using the standard inverse scalar quantizer.

It is important to reiterate that the LSBs of theTCQ indices determine the path through the trellisand are required for precise dequantization. Ifonly a portion of the LSBs are available at thedecoder (as would be the case if only one or two ofthe three LSB coding passes have been received bythe decoder), full TCQ dequantization cannot beutilized. For this reason, the following policy isfollowed in the current verification model (VM):1

If any of the three coding passes for the LSB of acodeblock is included within a packet, the encoderis forced to include all three passes. This allows full

TCQ dequantization to be utilized at the decoderafter this particular packet is received. It should benoted, however, that this is an encoder issue, andthus, is not specified by the standard.

TCQ provides visually superior results com-pared to SQ, when images have fine grain texture.Such an example is illustrated in Figs. 9–11. InFig. 9, the original image is displayed. Figs. 10 and11 displays the same image encoded at 0.3 bits/pixel using SQ and TCQ, respectively. By compar-ing Figs. 10 and 11, it can be seen that texture isbetter preserved using TCQ.

3.5. Multi-component and tiled images

In JPEG 2000, an image is a collection of two-dimensional rectangular arrays of samples. Eachof these arrays is called an image component. Tilesare non-overlapping rectangular regions of theimage and the array of samples from onecomponent that fall within a tile is called a tile-component. Within JPEG 2000 the (eb; mb) pairsmay be separately chosen for each tile-component,but cannot change within a tile-component. Thisallows step size choices to vary in different imagecomponents (for example intensity versus chroma-ticity in a color image) and in different tiles.

A signaling priority is used for efficient storageof this quantization information. A single defaultset of image quantization pairs (ebI; mbI) is signaledin the main header and used whenever no otherquantization signaling overrides it. Default com-ponent quantization pairs may be signaled tooverride the image default on a particular compo-nent. A tile-component using non-default quanti-zation must signal the new quantization as part ofits header. The new quantization only effects thisparticular tile-component.

3.6. Selection of d

The inverse scalar quantizer formulas incorpo-rate a user selectable parameter d which indicateswhere in the quantization interval the reconstruc-tion value #x is placed. A typical assignment of d ¼1=2 will place the reconstruction at the center ofthe quantization interval. Other considerationssuch as transform coefficient distribution and

1The verification model for JPEG 2000 consists of software

implementation of an encoding system, a decoding system, the

bit-stream definition and syntax, input and output definitions of

each system, operational procedures and documentations.


visual appearance may influence the location ofthe reconstruction value. Since there is no oneoptimal reconstruction positioning for all images,the JPEG 2000 standard allows the decoder freechoice of d:

Tuning of d involves a large number of factorsincluding: image type, subband coefficient distribu-tion, number of missing LSBs p; quantization index,neighboring coefficient values, and encoder embed-ding rules. Over a range of image types, the defaultd ¼ 1=2 assignment tends to provide the mostreliable reconstruction performance. For furtherinformation the interested reader is referred to [4].

3.7. Unusual quantization effects

When there is no change in the image bitdepthbetween encoding and decoding, JPEG 2000generates an image with data of the samemagnitude as the original. However, there aresituations requiring a change in the originalimage bitdepth. For example, maximum display

depth restrictions or display of binary data withhigh contrast. Sometimes this is a decoder choice,but in other circumstances the encoder orfileserver may force the change within thecompressed codestream. This section exploresthe effects of such changes.

Let RIS be the bitdepth of the source image andRID be the image bitdepth signaled to the decoder.Reversibly and irreversibly transformed data reactvery differently when RISaRID:

3.7.1. Irreversible transformWhen the transform is irreversible the decoder

quantization step size is a function of Rb; whichin turn is computed from the reported imagebitdepth RID: If RID > RIS the decoder willsee a larger step size than the encoder and ifRIDoRIS a smaller one. This difference translatesinto either a gain or loss in magnitude ofthe reconstructed image values. So altering thesignaled image bitdepth causes the most significantbits of the input data to become the most

Fig. 9. Original image.


significant bits in the decoded output data withoutany extra manipulation.

For instance, using this manipulation 12-bitimagery can be scaled down to 8 bits and reducedbitdepth imagery can be stretched to 8 bits.

3.7.2. Reversible transformWhen the transform is reversible, the step size is

always 1 and the signaled image bitdepth has noimpact on the magnitude of the inverse trans-formed image data. Instead the magnitude of theinverse transformed image data is proportional tothe signaled values eb: This is exactly the reverse ofthe situation with irreversible transforms.

If eb is set as anticipated ðeb ¼ RISþlog2 ðgainbÞÞthe inverse transform will indeed act reversibly, butthe reconstructed image bitdepth will be RIS: Inother words, the decoder will create the originalimage bitdepth regardless of the signaled imagebitdepth. If the image bitdepth is forced upwards,say from 1 to 8, the result will be an 8-bit image

with significant values only at the very lowestbitplanes (values of 0, 1 and possibly 2). If insteadthe image bitdepth is forced down (say from 12 to8), the output of the inverse transform will still havethe higher number of bits (12), and the applicationwill need to take extra steps unspecified by thestandard to decide which bits should be output.

If the encoder attempts to modify eb to forcescaling of the decompressed image data, anotherproblem will occur. Any time ebaRIS the rever-sible property of the transform is lost sincerounding/truncation no longer occurs at thecorrect bitplane. For example, lowering allthe values of eb by a fixed value k; will cause theoutput magnitude to be reduced by k bits, but willalso cause errors in the inverse transform thatwould not be present if the decoder applicationindependently reduced the bitdepth after thetransform. Likewise, increasing eb by a fixed valuek will stretch the data by 2k; but will also introduceerrors in the lower level bitplane values. For

Fig. 10. SQ quantized image at 0.3 bits/pixel.


example 1-bit data might be stretched to 8-bit viaeb and RID signaling, but the output image will notbe strictly binary even if all data are decoded.

4. Summary

Since JPEG 2000 is intended to serve a diverse setof applications, several quantization options areprovided within the standard. Part I of the standardincludes uniform scalar dead-zone quantization.Part II extends the quantization options to includegeneralized uniform scalar dead-zone quantizationand trellis coded quantization. All of the quantiza-tion methods support embedded coding and areconducive to the rich feature set of JPEG 2000.

In Section 2 of this paper, the quantizationmethods used in the standard are reviewed. InSection 3, several issues pertaining to the use ofthese methods within the standard are discussed.These issues range from selection and signaling ofthe quantization parameters to dealing with the

unusual quantization effects when the imagebitdepth changes between encoding and decoding.

References

[1] A. Bilgin, P.J. Sementilli, M.W. Marcellin, Progressive

image coding using trellis coded quantization, IEEE Trans.

Image Process. 8 (11) (1999) 1638–1643.

[2] J.G.D. Forney, The Viterbi algorithm, Proc. IEEE (Invited

Paper) 61 (1973) 268–278.

[3] J.H. Kasner, M.W. Marcellin, B.R. Hunt, Universal trellis

coded quantization, IEEE Trans. Image Process. 8 (12)

(1999) 1677–1687.

[4] M.A. Lepley, Tuning JPEG 2000 decompression performance

via rounding: theory & practice, in: Proceedings of SPIE, Appli-

cations of Digital Image Processing XXIII, Vol. 4115, 2000.

[5] M.W. Marcellin, T.R. Fischer, Trellis coded quantization

of memoryless and Gauss–Markov sources, IEEE Trans.

Commun. 38 (1990) 82–93.

[6] D. Taubman, E. Ordentlich, M. Weinberger, G. Seroussi,

Embedded block coding in JPEG 2000, Signal Processing:

Image Communication 17 (1) (2002) 49–72.

[7] J.W. Woods, T. Naveen, A filter based bit allocation

scheme for subband compression of HDTV, IEEE Trans.

Image Process. 1 (1992) 436–440.

Fig. 11. TCQ quantized image at 0.3 bits/pixel.


An overview of quantization in JPEG 2000 · 2007-02-11 · JPEG 2000 employs a dead-zone uniform...

Documents

Transcript of An overview of quantization in JPEG 2000 · 2007-02-11 · JPEG 2000 employs a dead-zone uniform...