A Mathematical Model for Shape Coding With B-splines Kats 2000 10.1.1.2.6450

*Corresponding author.E-mail addresses: [email protected] (F.W. Meier),

[email protected] (G.M. Schuster), [email protected] (A.K.Katsaggelos)

qA preliminary version of this paper was "rst published inProceedings of the 2nd Erlangen Symposium Advances in DigitalImage Communication, Erlangen, Germany, April 1997, pp.75}84.

Signal Processing: Image Communication 15 (2000) 685}701

A mathematical model for shape coding with B-splinesq

Fabian W. Meier!, Guido M. Schuster", Aggelos K. Katsaggelos#,*!Silicon Graphics, Mountain View, CA 94043, USA

"U.S. Robotics, Advanced Technologies Research Center, Mount Prospect, IL 60056, USA#Northwestern University, Evanston IL 60208-3118, USA

Abstract

A major problem in object-oriented video coding is the e$cient encoding of the shape information of arbitrarilyshaped objects. E$cient shape coding schemes are also needed in encoding the shape information of video object (VO) inthe upcoming MPEG-4 standard. Furthermore, there are many applications where only the shape needs to be encoded,such as CAD, 3D modeling and signature encoding. In this paper, we present an e$cient method for the lossy encoding ofobject shapes which are given as 8-connect chain codes using a mathematical model. We approximate a boundary bya second-order B-spline curve and consider the problem of "nding the curve with the lowest bit-rate for a givendistortion. The presented scheme is optimal, e$cient and o!ers complete control over the trade-o! between bit-rate anddistortion. It is an extension of our previous research where we used polygons to approximate a boundary. The mainreason for using curves rather than polygons is that curves have a more natural appearance than polygons and can givebetter coding e$cencies. We present results of the proposed scheme using objects boundaries in di!erent shapes and sizesas well as an MPEG-4 test sequence. ( 2000 Elsevier Science B.V. All rights reserved.

1. Introduction

1.1. Motivation for shape coding

This research is partly motivated by the import-ance of shape coding within the MPEG-4 standard,but also by object oriented video coding [8,16],where the encoding of shape information of objects

is an important problem. Furthermore, shape cod-ing algorithms can also be used for other applica-tions, such as CAD, object recognition and imagedescription in video databases. In this paper werefer to the shape information of a single object asa boundary (sometimes also referred as contours),which can be represented with a chain code. Usu-ally a boundary is de"ned in pixel resolution.In that case the boundary consists of a numberadjacent pixels which we call boundary points(Fig. 4 shows a simple boundary). A shape repre-sented in a binary alpha plane has the same valuesfor all the pixels that are inside the shape. A bound-ary is the contour around the shape in the alphaplane. The new multimedia standard MPEG-4[1,24], makes it possible to encode several layers ofoverlapping video information at the same time,

0923-5965/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 9 2 3 - 5 9 6 5 ( 9 9 ) 0 0 0 4 5 - 4

Fig. 1. The scope of this paper is an e$cient shape coding scheme.

where each layer contains one or more arbitrarilyshaped objects. In MPEG-4 terminology theselayers are called video objects (VO). A video objectplane (VOP) is a snapshot of a VO at a givenmoment of time. For example, one VO containsa moving person as a foreground object } the areaoutside the person's shape would be transparent} and the second VO contains background frames(see Fig. 1). A VOP is described by texture informa-tion (the actual pixel values within the object shape)and the shape information. The texture informationis encoded with a conventional block based method(as intra or inter blocks), the di!erence to a conven-tional block based code is that only blocks that arepartly or fully covered by the object shape will beencoded. The shape information is described by analpha-plane and can be either in a gray-scale for-mat or a binary format. In the binary alpha plane,all pixels within the object shape have the value1 assigned, whereas all remaining pixels have thevalue 0 and are supposed to be completely trans-parent. In this paper we are only concerned byshape information in binary format. Even though

the encoding of the shape information and theencoding of the texture information can be viewedas two separate steps, these two processes are re-lated: for example in MPEG-4 a macroblock thatlays on a shape boundary needs to be padded beforeencoding the texture. In [11] we proposed an ex-tension which includes texture information for theboundary encoding.

1.2. Shape coding approaches

The performance of a shape coding scheme canbe measured either with the absolute rate in bits orwith a relative measure e"R/N

Bin bits per bound-

ary point (bbp), where R is the rate to encode theboundary and N

Bthe number of boundary points.

For lossy encoding, using a coding e$ciencymeasure is only meaningful if a certain distortiondescribes the degree of the lossy boundary approxi-mation. In the following paragraphs we discusssome shape coding approaches.

Chain coding. Freeman [6] originally proposedthe use of chain coding for boundary quantization

686 F.W. Meier et al. / Signal Processing: Image Communication 15 (2000) 685}701

and lossless boundary encoding, which has attrac-ted considerable attention over the last 30 years[12,17,19]. The most common chain code is the8-connect chain code which is based on a rectangu-lar grid superimposed on a planar curve. Sincethe planar curve is assumed to be continuous,the increments between grid points are limited tothe 8 grid neighbors, and hence an increment canbe represented by 3 bbp. There have been manyextensions to this basic scheme such as generalizedchain codes [19], where the coding e$ciency hasbeen improved with the use of links of di!erentlength and di!erent angular resolution. There hasalso been interest in the theoretical performance ofchain codes. In [12] the performance of di!erentquantization schemes is compared, whereas in [17]the rate distortion characteristics of certain chaincodes are studied. A lossy simpli"ed chain code waspresented in [7] and used in [14] for shape codingwith bit-rates of 1.47 bbp.

Polygons. In [20}22] we found vertices in anoptimal way to approximate boundaries with poly-gons given a certain distortion. The current paperis based on this work. A polygonal shape code wasintroduced in [18] with an improved vertex encod-ing scheme. Bit-rates of 1.3 bbp for a maximal dis-tortion of 1.0 pixels and 1.0 bbp for a maximaldistortion of 2.0 pixels were achieved. An improvedversion of this method (INTRA and INTER cod-ing) was evaluated in for the MPEG-4 shape codingpart [9]. In an iterative scheme [10] vertices areforward predicted from the last frame, where newvertices are added and existing ones are dropped inorder to represent the object shape with the re-quired accuracy. The vertices locations are thenencoded with a DPCM Hu!man coding scheme.

Splines. Average bit-rates of 0.88 bbp wereachieved in [13] with third order B-spline curvesthat approximated boundaries with a maximumdistortion of 1.0 pixels. Because of the iterative ap-proach the results are not unique nor optimal anddepend on the initial curve. In [25] regions ina frame that contain motion were "rst segmented,and then lossy encoded with either macro-blocksfor a coarse representation or with Catmull}Rommsplines for a more accurate object shape representa-tion. In this paper, we approximate boundarieswith quadratic B-splines with a method that is based

on a shortest path algorithm for a weighted di-rected acyclic graph (DAG) [22]. Note that this isachieved in an optimal fashion, with a completecontrol over the tradeo! between distortion andbit-rate, resulting in an e$cient encoding scheme.The motivation for using B-splines are better cod-ing e$ciencies for objects in natural images. Suchobjects or shapes tend to have few straight lines andnarrow corners. Furthermore, straight lines can berepresented with second-order B-splines. Ref. [23]reviews and compares our research on di!erentapproaches of optimal shape coding using poly-gons and B-splines.

The shape coding method chosen for MPEG-4,called contex-based arithmetic encoding [2], uses anapproach that encodes the black and white bitmape$ciently rather than a shape itself. The advantageof this approach is the robustness of the algorithmfor arbritrary images and possible easy hardwareimplementation. The other major advantage of thisapproach is high coding e$ciency, especially forlossless and quasi-lossless shape coding (inter-coded frames). In [9] the di!erent methods thatwere being considered for MPEG-4 shape codingare reviewed.

This paper is organized as follows. The B-splinecurve is brie#y reviewed in Section 2. In Section 3we de"ne the problem and introduce the requirednotation. The de"nition of the set of admissiblecontrol points for the spline approximation is dis-cussed in Section 4. In Section 5 we consider theproblem of "nding the B-spline curve which re-quires the smallest bit-rate for a given distortion.We then consider the dual problem in Section 6,that of "nding the B-spline curve with the smallestdistortion for a given bit-rate. In Section 7 weintroduce a control point encoding scheme. Finally,in Section 8 we present simulation results of theproposed algorithm.

2. Review of B-spline curves

A B-spline is a speci"c curve type from the familyof parametric curves [5]. A parametric curve con-sists of one or more curve segments. Each curvesegment is de"ned by (n#1) control points wheren de"nes the degree of the curve. The control points

F.W. Meier et al. / Signal Processing: Image Communication 15 (2000) 685}701 687

Fig. 2. A second-degree B-spline curve with eight curve seg-ments Q

uand a double control point at the beginning of the

curve.

are located around the curve segment and togetherwith a constant base matrix M the control pointssolely de"ne the shape of the curve. A two-dimen-sional curve segment Q

uwith control points

(pu~1

, pu, p

u`1) is de"ned as follows:

Qu(p

u~1, p

u, p

u`1, t)"[x(t), y(t)]

for 0)t)1 and 0 otherwise. (1)

The points at the beginning and the end of a curvesegment are called knots and can be found bysetting t"0 and t"1. The following is the de"ni-tion for a second degree curve segment, with u asindex for the di!erent curve segments, and p

u,xand

pu,y

, respectively, as the horizontal and vertical co-ordinates of control point P

u,

Qu(p

u~1, p

u, p

u`1, t )"¹ )M )P

"[t2 t 1] ) Cm

1,1m

1,2m

1,3m

2,1m

2,2m

2,3m

3,1m

3,2m

3,3D

) Cpu~1,x

pu~1,y

pu,x

pu,y

pu`1,x

pu`1,y

D. (2)

Both, the base matrix M, with speci"c constantparameters for each speci"c type of parametriccurve, and the control point matrix P, with (n#1)control points, de"ne the shape of Q

uin a two-

dimensional plane. Every point of the curve seg-ment can be calculated with Eq. (2) by letting t varyfrom 0 to 1. Every curve segment can be calculatedindependently in order to calculate the entire curveQ, consisting of N

Pcurve segments, which is of the

following form:

Q(t@)"NP

+u/1

Qu(p

u~1, p

u, p

u`1, t@!u#1),

with 0)t@)NP#1. (3)

Among common parametric curves are the Beziercurve and the B-spline curve. For our boundaryapproximation algorithm we chose a second-order(quadratic) basis uniform non-rational B-spline

curve [5] with the following base matrix:

M"C0.5 !1.0 0.5

!1.0 1.0 0.0

0.5 0.5 0.0D. (4)

Fig. 2 shows such a second-order B-spline curve.The reason for choosing the B-spline with thelowest possible order (n"2) was to keep the com-plexity of the curve, and the proposed algorithm,minimal. Note that a "rst degree B-spline is a poly-gon. The shape coding method presented in thispaper is independent of the matrix M and degree n,that is, parametric curves of higher order can beused.

Double control points. The beginning and theend of the boundary approximation have to betreated as special cases, if the "rst curve segmentshould start exactly from the "rst boundary pointand the last curve segment should end exactly atthe last boundary point. When we use a doublecontrol point such as (p

u~1"p

u) the curve segment

Qu

will begin exactly from the double control point(see Fig. 2). We apply this property to the beginningand to the end of the curve, so that p

0"p

1and

pNP

"pNP`1

. These two special cases can easily beincorporated into the boundary approximationalgorithm.


3. Problem formulation

The goal of the proposed approach is to approx-imate a given boundary by a second order B-splinecurve: we have to "nd a set of control points whichde"ne a B-spline with the lowest possible rate givena certain distortion.

3.1. Notation

The following notation will be used. LetB"Mb

0,2, b

NB~1N denote the connected bound-

ary which is an ordered set, where bi

is the ithpoint of B and N

Bis the total number of points in

B. Note that in the case of a closed boundary,b0"b

NB~1. Let P"Mp

0,2, p

NP`1N denote the set

of control points of the B-spline curve, which isalso an ordered set, with N

Pthe total number

of curve segments. Every second-order B-splinecurve segment is de"ned by three control pointspu~1

, pu

and pu`1

, henceforth denoted byQ

u(p

u~1, p

u, p

u`1) without the use of t as in Eq. (1),

or simply by Qu. Note that every curve segment

shares control points with its neighboring curvesegments. Since P is an ordered set, the orderingrule and the set of control points uniquely de"nethe curve.

In general, the B-spline curve could be permittedto place its control points anywhere on the imageplane. We restrict the possible locations of controlpoints to a set A, which we will specify in detail inSection 4.

We assume that the locations of the controlpoints of the curve are encoded using a predictivescheme. This is an e$cient method for naturalboundaries since the locations of the control pointsare highly correlated. We denote the requiredbit-rate for the encoding of control point p

u`1,

given control point pu~1

and pu, by r(p

u~1, p

u, p

u`1).

Hence the total bit-rate R(p0,2, p

NP`1) for the

entire B-spline is equal to

NP

+u/0

r(pu~1

, pu, p

u`1), (5)

where r(p~1

, p0, p

1) is set equal to the number of

bits needed to encode the absolute position of the"rst control point p

0. However, since this number

depends on the size of the image plane, we neglectthe rate for encoding the absolute positionr(p

~1, p

0, p

1) when we calculate the bit-rate e. Note

that the rate r(pu~1

, pu, p

u`1) depends on a speci"c

control point encoding scheme, which will be dis-cussed in Section 7.

3.2. Distortion measure

Besides the rate, we also need the curve segmentdistortion for our proposed curve approximationscheme, which we de"ne as d(p

u~1, p

u, p

u`1).

One popular distortion measure for curve appro-ximation is the maximum absolute distance,which has also been employed in [8,12,17,18,21].Because of its perceptual relevance we also use themaximum absolute distance in this paper. Otherdistortion measures are evaluating the di!erencearea between the original and the approximatedboundary shape [3], as well as building a sum of allthe segment maximum distortion measurements[20].

So far we have only discussed the segment distor-tion measures, i.e., the measures which judge theapproximation of one part of the boundary bya given curve segment. In general, we are interestedin a curve distortion measure which can be used todetermine the approximation quality of an entirecurve. As mentioned above, we are using the max-imum absolute distance distortion measure. Thedistortion measure for the whole curve usingd(p

u~1, p

u, p

u`1) can be expressed as follows:

D(p0,2, p

NP`1)" max

u|*1,2,NP +

d(pu~1

, pu, p

u`1). (6)

Whereas the maximum absolute distance betweena boundary and an edge of a polygon approxima-tion can be calculated exactly [21], there is nounique method to measure the maximum absolutedistance between a boundary and a curve approxi-mation. However, the curve segment distortion canbe de"ned as the distance of every boundary pointto the closest point of its approximated representa-tion. If we imagine a distortion-band with the width2 )D

.!9along the boundary B, a B-spline approxi-

mation must always be inside the band in order to


Fig. 3. The distortion band of width 2 )D.!9

along the objectboundary B. The band is constructed by moving a mask withradius r"D

.!9along the boundary of the object. The resolution

of the distortion band is 1/3 pixel.

satisfy the distortion requirement:

d(pu~1

, pu, p

u`1)

"G0: all points of Q

u(p

u~1, p

u, p

u`1)

are inside the distortion bandof width 2 )D

.!9,

R: any point of Qu(p

u~1, p

u, p

u`1)

is outside the distortion bandof width 2 )D

.!9. (7)

This distortion measure takes a curve segmentQ

u(p

u~1, p

u, p

u`1) as input and checks if the curve

segment is fully inside the distortion band withwidth 2 )D

.!9.

Implementation of the distortion measure. The fol-lowing method illustrates an e$cient implementa-tion of Eq. (7). A distortion band can be drawnby assigning all pixels to the band that are withina certain distance D

.!9from every boundary pixel.

A given curve segment is quantized to pixel resolu-tion, and then every curve pixel is tested whether itis inside or outside the distortion band. The disad-vantage of this procedure is that values of D

.!9can

only be multiples of one pixel. It can be overcome ifwe introduce a distortion band with sub-pixel res-olution. For our experiments we chose a distortionband with a sub-pixel resolution of 1

3pixel (see

Fig. 3). Clearly, the distortion band does not need

to be symmetric. Without loss of generality, asymmetric band is primarily considered in thefollowing.

3.3. Optimization problems

In the remainder of the paper we introduce a fastand e$cient algorithm which solves the followingconstrained optimization problem:

minp0 ,2,pNP`1

R(p0,2, p

NP`1),

subject to D(p0,2, p

NP`1))D

.!9, (8)

where D.!9

is the maximum distortion permitted.Note that there is an inherent tradeo! between therate and the distortion in the sense that a smalldistortion requires a high rate, whereas a small rateresults in a high distortion. Using the optimal solu-tion of Eq. (8) iteratively we will also write a solu-tion to the dual problem, that of "nding a B-splinecurve approximation with the lowest possible dis-tortion given a maximum rate R

.!9, that is,

minp0 ,2,pNP`1

D(p0,2, p

NP`1),

subject to R(p0,2, p

NP`1))R

.!9. (9)

4. Admissible control point set

From a theoretical point of view, the set of ad-missible control points for a B-spline boundaryapproximation should contain all pixels in the im-age plane. In order to keep the algorithm e$cient,we restrict the control points to a set of relevantlocations. We call this the set of admissible controlpoints A and de"ne it as a band along the boundaryB, where the band is determined by =

.!9(see

Fig. 4). =.!9

is measured from the center of theboundary pixel to the center of the admissible con-trol point pixel. A must be an ordered set to employthe presented boundary approximation algorithm.We therefore propose to order set A by assigningall points of A to their nearest boundary point andthen imposing the order of the boundary onto theset A. The assigning algorithm [20] is explained in


Fig. 4. The admissible control point set A is a band determinedby =

.!9. Note that b

0/b

24"a

0,0/a

24,0have no additional

points assigned.

Table 1Vector v with components v

x, v

yand length= as a function of

index j

j vx

vy

= j vx

vy

=

0 0 0 0.00 5 1 1 1.411 1 0 1.00 6 !1 1 1.412 0 1 1.00 7 !1 !1 1.413 !1 0 1.00 8 1 !1 1.414 0 !1 1.00 9 2 0 2.00

10 0 2 2.0011 !2 0 2.0012 0 !2 2.00

the next section. Every admissible control pointai,ib

has two indices: i and ib. Index i has the same

number as the index of its closest boundary pointbi. The second index i

benumerates all admissible

control points that have the same index i. Indexib

starts always at 0 and every point ai,0

is byde"nition equal to b

i. Therefore, the simplest case is

when A is equal B with =.!9

"0. Clearly, forvalues of=

.!9larger than zero, a varying number

of admissible control points are associated withevery boundary point (compare the sets A and Bin Fig. 4). A good value for =

.!9is 1.0 pixel,

which results in an admissible control point band ofthickness equal to three pixels along boundary B.We further de"ne that the "rst and the last bound-ary points have no additional admissible controlpoints assigned, so that the B-spline will beginexactly at b

0"a

0,0and will end at exactly

bNB~1

"aNB~1,0

.Assigning algorithm. The assigning algorithm of

admissible control points consists of two loops,where the outer loop draws spirals around theboundary points. The radius = for drawing thespirals (see Table 1) are values with discrete in-crements of range 02=.!9

. The following algo-rithm assigns pixels from the image plane to set

A as a function of =.!9

:

1. Set initial counter variable j equal to zero.2. Pick a vector in Table 1 with the current vari-

able j as index.3. Add vector v with to the boundary point b

i.

Assign the new point (bi#v) to A if this point is

not yet an element of A.4. Repeat line 3 for all boundary points.5. Increase j by 1. If =)=

.!9go back to line 2,

else exit the algorithm.

= denotes the length of vector v:="Jv2x#v2

y.

5. Finding a curve with the lowest rate

In this section we "rst show how to describea mathematical model of the given boundary usinga graph. In a second step we will "nd the B-splinecurve with the lowest bit-rate using a shortest pathalgorithm.

The B-spline curve that will be found by theproposed algorithm has the two following require-ments:

f The distortion of the curve is smaller than orequal to the maximum distortion D

.!9as stated

in Eq. (8).f The control points must be selected from a speci-"ed admissible control point set A.

The key observation for deriving an e$cientsearch is the fact that, given two control pointspu~1

and pu

of a B-spline curve and the minimum


1The following exception is necessary to allow double-controlpoints at the beginning and the end of the curve: i"k ifMi"0,N

B!1N.

rate which is required to code the curve up to andincluding these two control points, RH

u(p

u~1, p

u), the

selection of the next control point pu`1

is indepen-dent of the selection of the previous control pointsp0,2, p

u~2. This is true since the smallest total rate

RH can be expressed recursively as a function of thecontrol point rates r(p

u~1, p

u, p

u`1), that is,

RHu`1

(pu, p

u`1)

"minpu~1

MRHu(p

u~1, p

u)#r(p

u~1, p

u, p

u`1)N. (10)

Recall that the distortion of the curve segmentQ

udepends also on the three control points p

u~1,

pu

and pu`1

. The segment distortion can becombined with the segment rate by de"ninga weight function w as follows:

w(pu~1

, pu, p

u`1)

"r(pu~1

, pu, p

u`1)#d(p

u~1, p

u, p

u`1). (11)

Note that w is equal to the rate for all the curvesegments which satisfy the distortion constraintof Eq. (7), but in"nite for those which do not.Hence by replacing r in Eq. (10) by w, the rateRH

u`1(p

u, p

u`1) is in"nite if a curve segment is used

which violates the distortion constraint and equalto the required rate otherwise. The recursion ofEq. (10) needs to be initialized by setting RH

0(p

~1, p

0)

equal to zero. Clearly, RHNP`1

(pNP

, pNP`1

)"min

p0 ,2,pNP`1MR(p

0,2, p

NP`1)N, is the rate for the

entire curve.Because of the recursive relationship in Eq. (10),

the problem in Eq. (8) can be formulated as a shor-test path problem in a weighted directed graph.A vector E starts at control point p

u"a

i,iband

ends at control point pu`1

"ak,kb

with the condi-tion that both admissible control points cannot beassigned to the same boundary point (E"

(ak,kb

!ai,ib

)3A2 ; ∀iOk).1 A path of order K fromcontrol point p

0to control point p

Kis an ordered

set Mp0,2, p

KN. The total length or total weight of

the path is rede"ned as follows:

K~1+u/1

w(pu~1

, pu, p

u`1). (12)

Again, note that the de"nition of the weight func-tion w leads to a length of in"nity for every paththat includes a curve segment with an approxima-tion error larger than D

.!9. Therefore, a shortest

path algorithm will not select a path with one ormore distorted curve segments.

The classical algorithm for solving such a single-source shortest-path problem, where all the weightsare non-negative, is Dijkstra's algorithm [4]. Thisis a signi"cant reduction compared to the timecomplexity of the exhaustive search. We can furthersimplify the algorithm by observing that it is veryunlikely for the optimal path to select a controlpoint p

u`1"a

k,kbwhere the last selected control

point was pu"a

i,iband i'k. This restriction re-

sults in that the selected curve approximation hasto follow the original boundary without rapid di-rection changes, and more important, the resultinggraph is now a weighted directed acyclic graph(DAG). Hence the vector set is rede"ned in thefollowing way: (E"(a

k,kb!a

i,ib)3A2; ∀i(k).1

A DAG-shortest-path algorithm [4] "nds a single-source shortest path and is faster than Dijkstra'salgorithm. As can be seen from the DAG algorithm,the time complexity is cubic in the number of ad-missible control points, which very low for theresulting optimal selection.

Let RH(aj,jb

, ai,ib

) represent the minimum total rateto reach the control point p

u"a

i,ibfrom the source

control point p0"a

0,0via a B-spline curve ap-

proximation. pu"a

i,iband p

u~1"a

j,jbare the two

last control points of that curve. Clearly,RH(a

NB~1,0, a

NB~1,0) is the solution to problem (8),

since it represents the minimum total rate to reachthe last control point, which is by de"nition equalto the last boundary point.

The sliding window. With the formulation of theboundary approximation problem the solution ofthe DAG-shortest-path algorithm results in a triv-ial solution which is illustrated in Fig. 5. This isa correct solution under the current conditions, butnot a useful boundary approximation. We needa way to force the algorithm along the boundary


Fig. 5. The sliding window restricts the selection of controlpoint p

u`1to all the admissible control points within the sliding

window. The introduction of a sliding window prevents trivialsolutions.

in order to "nd a curve. With the introduction ofa sliding window we not only avoid trivial solutionsbut also are able to control the speed of the algo-rithm. The sliding window restricts the selection ofcontrol point p

u`1to all the admissible control

points within the sliding window. The length of thesliding window l

48in pixel is measured along the

boundary, starting at control point pu. l

48can be

any value below NB/2 for closed boundaries and

below NB

for open boundaries. Changing l48

givesus also the possibility to in#uence the smoothnessof the B-spline curve where larger values of l

48re-

sult in smoother curves.Example. Fig. 6 shows an example how a DAG

is derived from a boundary (NB"7) and how the

shortest-path solution leads to a lossy-shape ap-proximation. The admissible control point set A isequal to B; the only valid control point locationsare all boundary pixels. The reason of A being verysmall is to keep the complexity of this example aslow as possible. The DAG in Fig. 6(B) consists ofstates and vectors, in graph theory commonlycalled vertices and edges. Several states are asso-ciated with a single admissible control point a

i.

Every state is uniquely described by two admissiblecontrol points (a

j, a

i), where a

jrefers to the state's

connecting previous state, with (aj3A; ∀j)i)1.

The total number of states that are associated with

each admissible control point is l48"N

B!i, but

in our example we restrict l48

to 3. The slidingwindow starts at p

u"a

i`1and includes all points

of A up to ai`l48

. The length or weight of a vector isassigned with Eq. (11) in the unit bits. Note thatfunction Eq. (11) has three input variables; the "rsttwo variables represent the two admissible controlpoints associated by the state where the vectorbegins and the third variable is the admissible con-trol point associated with the state where the vectorends. Every vector represents now a B-spline curvesegment, so that any path in the DAG from state(a

0, a

0) to state (a

6, a

6) is a possible curve approxi-

mation of boundary B. Let RH(ai ,aj )

be the best totalrate of the path from the "rst state (a

0, a

0) to state

(ai, a

j); and let Ptr(a

i, a

j) be a back pointer that is

used to remember that path. RH(ai ,aj )

is the sum of allthe weights of that path.

The task of a shortest-path algorithm is to "nda path from state (a

0, a

0) to state (a

6, a

6) with the

lowest total rate, which is clearly RH(a6,a6)

. Becausewe interpret the length of a vector as the number ofbits necessary to encode that vector, the shortestpath is the path with the lowest total bit-rate. Oncea shortest path is found, the control points forthe B-spline approximation are all the admissiblecontrol points assigned to the states of this path(Fig. 6(D)).

Pseudocode for the example. The following algo-rithm outlines the DAG-shortest-path algorithmused to "nd the minimum rate of the previousexample (Fig. 7). The input variable to the algo-rithm is the admissible control point set A, with theparameters maximum distortion D

.!9and length of

the sliding window lSW

. The output of the algorithmconsists of an optimal path and its rate RH. Thenthe proposed algorithm works as follows:

DAG shortest-path algorithm:(1) RH(a

j, a

i)"R; i, j3M0,2, N

B!1N

(2) RH(a0, a

0)"0;

(3)(4) for i"0,2,N

B!1; Loop 1

(5) M(6) for j"i!l

48,2, i!1; Loop 2

(7) M(8) for k"i#1,2, i#l

48; Loop 3

(9) M


Fig. 6. The boundary in (A) can be modeled by a graph (B). Once a set of admissible control points A is de"ned (A), a DAG can bede"ned (B). The shortest path algorithm "nds a set of control points of the shortest path (C) from state (a

0, a

0) to state (a

6, a

6). The

control point set de"nes the B-spline curve approximation (D) of the original boundary.

(10) pu~1

"aj; p

u"a

i; p

u`1"a

k;

(11) distortion of Qu: d

u"d(p

u~1, p

u, p

u`1);

(12) rate of Eu: r

u"r(p

u~1, p

u, p

u`1);

(13) weight: wu"d

u#r

u;

(14) if RH(pu~1

, pu)#w

u(RH(p

u, p

u`1);

(15) M(16) store the new lowest rate:(17) RH(p

u, p

u`1)"RH(p

u~1, p

u)#w

u;


Fig. 7. (A) The DAG-shortest-path algorithm. (B) Sliding window at its current location.

(18) assign back pointer to previous(19) state value:(20) Ptr (p

u, p

u`1)"(p

u~1, p

u);

(21) N(22) NNN

The optimal path with the lowest rate RH ends atstate (a

NB~1, a

NB~1). Now the complete optimal

path MpH0,2, pH

NP`1N can be found by back tracking

the pointers Ptr( ) ) in the following recursivefashion:

(pHu~1

,pHu)"Ptr(pH

u, pH

u`1), u"N

P,2, 1, (13)

with Ptr(pHNP

, pHNP`1

)"Ptr(aNB~1

, aNB~1

) as initialvalue. The formal proof of the correctness of the


DAG-shortest-path algorithm, on which the abovescheme is based, can be found in [4].

Detailed explanation of the pseudocode. We willreason more intuitively how this approach works.In lines (1) and (2) the initial control conditions forthe graph are de"ned. Loop 1 with the countervariable i selects all points in A in sequence. Loop3 with the counter variable k selects all points fromai`1

up to the end of the boundary. Hence theloops 1 and 3 select each possible vector E in theDAG exactly once. The number of repetitions ofthe loops 2 and 3 can be restricted with the lengthof the sliding window l

48in order to speed the

algorithm up. A DAG-shortest-path algorithm fora polygon [21] needs only two loops because anedge segment is de"ned by two vertices. Lines(11) to (13) are used to calculate the weightw(p

u~1, p

u, p

u`1) of every curve segment. The most

important part of this algorithm is the comparisonon line (14). Here we test if the new bit-rate,RH(p

u~1, p

u)#w(p

u~1, p

u, p

u`1), to reach the ad-

missible control point pu`1} given that the last two

control points were pu~1

and pu} is smaller than

the smallest bit-rate used so far to reach the pu`1

. Ifthe bit-rate is indeed smaller, then it is assigned asthe new smallest bit-rate to reach the control pointpu`1

on line (17). We also assign the back pointer ofstate (p

u, p

u`1) to state (p

u~1, p

u).

This algorithm leads to the optimal solutionbecause, as stated earlier, if the rate RH(p

u~1, p

u) of

two control points pu~1

and pu

is given, then theselection of any future control point p

u`1is inde-

pendent of the selection of the past control points.

6. Finding a curve with a given bit-rate

We now consider the minimum distortion casewhich is stated in Eq. (9). In the minimum distor-tion case we have a given rate budget R

"6$'%5, and

we would like to "nd a curve with the lowestpossible distortion where the corresponding ratemust be equal to or smaller than R

"6$'%5. We pro-

pose an iterative solution to this problem which isbased on the fact that we can solve the dual prob-lem stated in Eq. (8) optimally. Consider D

.!9in

Eq. (8) to be a variable. We derived in the previoussection an algorithm which "nds the curve approxi-

mation which results in the minimum rate for anyD

.!9. We denote this optimal rate by RH(D

.!9).

Since RH(D.!9

) is a non-increasing function (fora proof see [21]), we can use bisection to "nd theoptimal DH

.!9such that RH(DH

.!9)"R

.!9.

7. Control point encoding scheme

So far, any control point encoding scheme whichsatis"es the assumption that the control points areencoded di!erentially, i.e., the rate to encode pointpu`1

depends only on the previous two points,pu

and pu~1

, could have been used. In this sectionwe present a speci"c control point encoding schemeto encode the vector E

ubetween the control points

pu

and pu`1

. The encoding scheme can be con-sidered as a combination of a modi"ed 8-connectchain code and a run-length encoding scheme [21].The chain code and the run-length encoding can becombined by representing the vector betweentwo control points by an angle a and a run b, whichform the symbol (a,b). For each of the possiblesymbols (a, b) we encode the angle and the runindependently. In this paper, we employ a entropycode for encoding the runs b, which is displayed inTable 2. Note that the longest possible run is 15 sothat any vector with length larger than 15 cannotbe encoded. In consequence, setting the length ofthe sliding window l

48to values above 15 does not

result in lower bit-rates with this run-length encod-ing scheme.

Clearly it takes 3 bits to encode eight equallyprobable directions with the 8-connected chaincode. The eight angles have 453 increments,therefore there are only eight available directions toencode the angle of the vector. This scheme to-gether with Hu!man coding for the run length ofthe vector was used in [21]. In natural boundaries,the arrival direction of a vector is highly correlatedwith the departure direction of the following vector.This implies that the arrival direction should beused to predict the departure direction (seeFig. 8(A)). We propose to use only 2 bits for a for thefour most probable directions. These four relativedirections of a are M!903,!453,#453,#903N,where 03 is the direction of the previous vector (seeFig. 8(B)). The rate function r(p

u~1, p

u, p

u`1) must


Table 2Run length encoding table. The bit-rate of control point vectorE as a function of its run length b

Run Rate Run Rate Run Rate

1 2 bit 6 4 bit 11 5 bit2 3 bit 7 4 bit 12 5 bit3 3 bit 8 5 bit 13 5 bit4 4 bit 9 5 bit 14 5 bit5 4 bit 10 5 bit 15 5 bit

Fig. 8. (A) The control point vector Eiis encoded with its run

length b and the relative angle a. (B) Possible direction for a,encoded with 2 bits.

Table 3Alternative run length encoding table

Run Rate Run Rate Run Rate

1 3 bit 6 R 11 R

2 2 bit 7 R 12 3 bit3 R 8 3 bit 13 R

4 2 bit 9 R 14 3 bit5 R 10 R )15 R

consider the case when a vector cannot be encoded.If this happens, the rate is considered in"nite. Thiscan be expressed as follows:

r(pu~1

, pu, p

u`1)"G

rate(a)#rate(b):Eu

is codeable

R:Eu

is not codeable

rate(a)"2 bit, rate(b)"2 to 5 bit.

(14)

With this scheme we clearly restrict the set of code-able vectors. Experiments showed that using thisencoding scheme with four relative angles results inboundary approximations with lower rates thanwhen we used a scheme with 8 absolute angles.

Special cases:f Since we use relative angles to encode the vectors

E it is necessary to encode the "rst vectorE0

with an absolute angle; therefore we need3 bits for the angle of the "rst vector.

f As mentioned before, the "rst two control pointsp0

and p1

are identical, so are the last twocontrol points, p

NPand p

NP`1. These double con-

trol points make sure the curve starts exactly atthe "rst boundary point and ends exactly at thelast boundary point. Consequently r(p

0, p

1, p

2)

(rate of the "rst curve segment) and r(pNP~1

,pNP

, pNP`1

) (rate of the last curve segment) areassigned to zero.

f In case of a closed boundary, the "rst two con-trol points are identical to the last two controlpoints. The vector from control point p

NP~1to

control point pNP

is therefore redundant and

does not need to be encoded; rate r(pNP~2

,pNP~1

, pNP

) can be set to zero.

The proposed shape coding algorithm is not re-stricted to the just described control point encodingmethod. Other control point schemes mightachieve lower rates for some shapes, the perfor-mance of di!erent encoding schemes are shape de-pendent. The control point encoding scheme can beviewed as a replacable module to the proposedshape coding algorithm. For example an alterna-tive run-length encoding scheme (Table 3) could beused.

8. Simulations

8.1. Simulation conditions

To demonstrate the proposed shape coding schemewe encoded three di!erent objects boundaries with


Fig. 10. Results of the simulation: shape encoding bit-rate e inbits per boundary point (bbp) versus the maximal distortionD

.!9.

Fig. 9. The three object shapes used for the experiments with 70,158 and 257 boundary points.

Fig. 11. Object shape approximations of shape 1.

70 (Shape 1), 158 (Shape 2) and 257 (Shape 3)boundary points, the objects are shown in Fig. 9.For the encoding simulations we varied the max-imum distortion D

.!9from 0.4 to 4.0 pixels. We

chose the following constant parameters for allsimulations:

f =.!9

"1.0 (width of the admissible controlpoint band A).

f l48"15 (length of the sliding window)

f The rate to encode the absolute position of the"rst control point is neglected since it is notexactly known.

f Use Table 2 for run length encoding.

The following parameters were used for 100frames of the MPEG-4 test sequence kids:

f =.!9

"2.0 (width of the admissible controlpoint band A).

f l48"14 (length of the sliding window)

f Include the rate to encode the absolute positionof the "rst control point.

f Use Table 3 for run length encoding.

8.2. Results

Fig. 10 shows the performance of the shape cod-ing algorithm in function of the distortion. Averageencoding rates in our experiences in the range of0.70}0.84 bbp were achieved with distortion valuesof D

.!9"1.0 and e"0.57}0.64 with D

.!9"2.0.

The bit-rates e for all three encoded boundaries gointo a saturation for distortion values of 2.0 andlarger. Fig. 11 illustrates how better bit-rates areachieved on cost of the shape distortion. Note thata B-spline curve approximation with D

.!9+0.5


Fig. 13. The average bit-rate per frame (average of 100 frames)versus the distortion D

/of the Kids sequence (352]240 pixels).

Fig. 12. Distortion band of width 2 )D.!9

"2 ) 1.0 of Shape1 results in an encoding e$ciency of e"0.84 (rate"59 bits).The resolution of the distortion band is 1/3 pixel, whereas theresolution of the location of the control points is 1 pixel.

Fig. 14. Frame 14 of the QCIF kids sequence (176]144 pixels), the shape information is encoded with D.!9

"1.0 resulting in a rate of487 bits and a D

/"0.071. Frame on the left: original shape (black lines), frame on the right: encoded shape information.

has a distortion band with width of about 1.0 pixel,which is a lossless representation of the boundary.In this case an encoding scheme that incorporateseight encodable angles rather than four will achievebetter encoding rates.

Fig. 12 shows the B-spline curve approximationfor Shape 1 given D

.!9"1.0. The curve is, accord-

ing to the algorithms conditions, always inside thedistortion band, the distortion band consists ofgray squares of size 1

3by 1

3pixel.

The results of encoding the MPEG-4 test se-quence kids (352 by 240 pixels) are illustrated witha rate-distortion chart in Fig. 13. Instead of thedistortion measure D

.!9in the x-axis the distortion

D/

is used in this "gure. D/

represents the numberof incorrectly coded pixels divided by the numberof pixels inside the shape (white pixels). The bit-rateis averaged over 100 frames of the kids sequence.Fig. 14 illustrates frame 14 of the QCIF kids se-quence resolution (176]144 pixels) that wasencoded with distortion D

.!9"1.0, Fig. 15 shows

frame 150 of the "sh sequence.

9. Discussion

We presented an e$cient method for thelossy encoding of object shapes which are given


Fig. 15. Frame 150 of the "sh sequence (176]144 pixels), encoded with D.!9

"1.0 resulting in a rate of 304 bits and a D/"0.024.

Frame on the left: original shape (black lines), frame on the right: encoded shape information.

as 8-connect chain codes. The object shape isapproximated by a second-order B-spline curvewhich leads to the smallest bit-rate for a givendistortion. The B-spline curve is found by a shor-test-path algorithm since the problem can bedescribed as a single-source directed acyclicgraph (DAG). We introduced an encoding schemethat describes a control point vector using 2 bitsfor the angle information and 2 to 5 bits forthe length information. Bit-rates of 0.7 bbp and0.57 bbp for a maximum distortion of 1.0 and2.0 pixels, respectively, were achieved experi-mentally.

A more sophisticated encoding scheme for thecontrol point vectors might result in lower bit-rates. Encoding of the angle information with 2 bitsseems to be very e$cient, but the run-length encod-ing scheme could be potentially improved. As wasalso mentioned earlier higher order curves as wellas distortion bands of variable width can be incorp-orated into the proposed algorithm in a straightfor-ward way.

Taking advantage of the temporal similarities ofadjacent frames could reduce the overall bit-rate ina shape coding scheme. The INTER coding methodused for shape coding with vertices [9] uses motionvectors and reduces the INTRA bit-rate by about50%. Such an approach could be used with theproposed B-Spline coding scheme, but the INTERcoding steps would not be rate-distortion optimal.The extension of our current model from optimalINTRA coding to optimal INTER coding is subjectof our current research [15].

Acknowledgements

The authors would like to thank Gerry Mel-nikov, Northwestern University, for his help in thesimulations section.

References

[1] M.R. Banham, J.C. Brailean, An overview of the MPEG-4standard: Enabling digital multimedia compression, in:International Conference on Advanced Science andTechnology, Schaumburg, IL, 1997, pp. 2}19.

[2] N. Brady, F. Bossen, N. Murphy, Contex-based arithmeticencoding of 2d shape sequences, in: Proceedings of theInternational Conference on Image Processing, SantaBarbara, CA, 1997.

[3] Common test conditions, core experiments on MPEG-4video shape coding, Tech. Rep., Video Subgroup/AdhocGroup on Shape Coding, MPEG-4, 1996.

[4] T. Cormen, C. Leiserson, R. Rivest, Introduction toAlgorithms, McGraw-Hill, New York, 1991.

[5] Foley, vanDam, Feiner, Hughes, Computer Graphics:Principles and Practice, Addison-Wesley, Reading, MA,1990, pp. 478}516.

[6] H. Freeman, On the encoding of arbitrary geometric con-"gurations, IRE Trans. Electron. Comput. EC-10 (June1961) 260}268.

[7] C. Gu, M. Kunt, Contour simpli"cation and motioncompensated coding, Signal Processing: Image Commun-ication 7 (November 1995) 279}296.

[8] M. HoK tter, Object-oriented analysis}synthesis codingbased on moving two-dimensional objects, Signal Process-ing: Image Communication 2 (December 1990) 409}428.

[9] A.K. Katsaggelos, L.P. Kondi, F.W. Meier, J. Ostermann,G.M. Schuster, MPEG-4 and rate-distortion-based shape-coding techniques, Proceedings of IEEE (Special Issue onMultimedia Signal Processing) 86 (June 1988) 1126}1154.


[10] P. Kau!, B. Makai, S. Rauthenberg, U. Goelz, J.L. Lameil-lieure, T. Sikora, Functional coding of video usinga shape-adaptive DCT algorithm and an object-basedmotion prediction tool box, IEEE Trans. Circuits SystemsVideo Technol. 7 (February 1997) 181}195.

[11] L.P. Kondi, F.W. Meier, G.M. Schuster, A.K. Katsaggelos,Joint optimal object shape estimation and encoding, in:Proceedings SPIE Conference on Visual Communicationsand Image Processing, SPIE, January 1998, pp. 784}795.

[12] J. Koplowitz, On the performance of chain codes forquantization of line drawings, IEEE Trans. Pattern Anal.Mach. Intell. PAMI-3 (March 1981) 180}185.

[13] R. Lagendijk, J. Biemond, Low bit-rate coding for mobilemultimedia communications, in: Proceedings of the Euro-pean Signal Processing Conference, 1996, pp. 435}438.

[14] M.-C. Lee, W.-G. Chen, C.-L.B. Lin, C. Gu, T. Markoc, S.I.Zabinsky, R. Szeliski, A layered video object coding systemusing sprite and a$ne motion model, IEEE Trans. CircuitsSystems Video Technol. 7 (February 1997) 130}145.

[15] G. Melnikov, G.M. Schuster, A.K. Katsaggelos, Exploitingtemporal correlation in shape coding, in: Asilomar Confer-ence on Signals, Systems, and Computers, Paci"c Grove,CA, November 1998, pp. 1601}1605, invited paper.

[16] H.G. Musmann, M. HoK tter, J. Ostermann, Object-orientedanalysis}synthesis coding of moving images, Signal Pro-cessing: Image Communication 1 (October 1989) 117}138.

[17] D. Neuho!, K. Castor, A rate and distortion analysis ofchain codes for line drawings, IEEE Trans. Inform. TheoryIT-31 (January 1985) 53}68.

[18] K.J. O'Connell, Object-adaptive vertex-based shape cod-ing method, IEEE Trans. Circuits Systems Video Technol.7 (February 1997) 251}255.

[19] J. Saghri, H. Freeman, Analysis of the precision of general-ized chain codes for the representation of planar curves,IEEE Trans. Pattern Anal. Mach. Intell. PAMI-3 (Septem-ber 1981) 533}539.

[20] G.M. Schuster, A.K. Katsaggelos, An e$cient boundaryencoding scheme which is optimal in the rate distortionsense, in: Proceedings of the International Conference onImage Processing, Vol. II, Lausanne, Switzerland, Septem-ber 1996, pp. 77}80.

[21] G.M. Schuster, A.K. Katsaggelos, Rate-Distortion BasedVideo Compression, Optimal Video Frame Compressionand Object Boundary Encoding, Kluwer Academic Press,Dordrecht, 1997.

[22] G.M. Schuster, A.K. Katsaggelos, An optimal polygonalboundary encoding scheme in the rate distortion sense,IEEE Trans. Image Process. (January 1998) 13}26.

[23] G.M. Schuster, G. Melnikov, A.K. Katsaggelos, Opera-tionally optimal vertex-based shape coding, IEEE SignalProcess. Mag. 15 (November 1998) 91}108.

[24] T. Sikora, The MPEG-4 video standard veri"cationmodel, IEEE Trans. Circuits Systems Video Technol. 7(February 1997) 19}31.

[25] R. Talluri, K. Oehler, T. Bannon, J.D. Courtrey, A. Das,J. Liao, A robust, scalable, object-based video compressiontechnique for very low bit-rate coding, IEEE Trans. Cir-cuits Systems Video Technol. 7 (February 1997) 221}233.


A Mathematical Model for Shape Coding With B-splines Kats 2000 10.1.1.2.6450

Documents

Transcript of A Mathematical Model for Shape Coding With B-splines Kats 2000 10.1.1.2.6450