Medial Visual Fragments as an Intermediate Image Representation for Segmentation...

Medial Visual Fragments as an Intermediate Image Representation forSegmentation and Perceptual Grouping

I. M. Anonymous M. Y. Coauthor

My Department Coauthor DepartmentMy Institute Coauthor Institute

City, STATE zipcode City, STATE zipcode

Abstract

We present a novel representation of images based on adecomposition into atomic patches which we callmedialvisual fragments. The medial axis/shock graph of a con-tour map partitions the image domain into non-overlappingregions, which together with the image information definethe visual fragments. The main advantage of such a rep-resentation is that both contour and regional informationare explicitly available so that in the presence of partialevidence and ambiguity in maps indicating edges and re-gional homogeneity, both aspects can be simultaneouslyused for perceptual grouping of fragments into a coherentwhole. Grouping of visual fragments is represented as a setof canonical transformations of visual fragments, thegapand loop transforms. The advantage of this representationin comparison to perceptual grouping using only contourcontinuity or region grouping is demonstrated on syntheticand realistic examples.

1 Introduction

The drive to produce complete object boundaries directlyfrom local image features cannot succeed in the presenceof occlusion and other visual variations unless suitable sta-ble intermediate representations are formed in the process.These representations must deal with partial evidence andambiguity whether region-based or edge-based: on the onehand, contours can have a diffused profile such that onlyimpractically large edge operators can detect their pres-ence; contours of low contrast but good geometric conti-nuity are salient but can occasionally fall below operatorthreshold, leading to gaps,etc., see Figure 2. On the otherhand, distinct regions are often merged when they are appar-ently similar in intensity or other attributes, while graduallychanging image areas are often broken into distinct regions,Figure 2. There is a natural tradeoff between the number offalse positives and missed contours/regions, such that thereis considerable ambiguity in the resulting low-level descrip-tion of the image. In this paper we argue that the inherentrepresentation of images beyond this level of descriptionmust include both region-based and edge-based attributesas purely region-based or purely edge-based methods arefundamentally limiting to the capability of the segmenta-

tion process to disambiguate local ambiguities. We discusseach case in turn.

Figure 1: This synthetically generated image illustrates severalissues that plague region-based and contour-based representations.[A] Diffused edges, [B] Low Contrast edges, [C] Textured regions,[D]Contours are broken up by gaps, and [E] Internal Contours.

The goal of region-based segmentation is to group pixelsinto coherent regions. The basic intermediate representa-tion underlying this type of segmentation is a set of closed,connected, and non-overlapping regions, which we will callregion fragments, such that each pixel belongs to a regionfragment. Among all the partitionings of an image domaininto regions, that which optimizes some measure of intra-region coherence (intensity, color, texture,etc.) and penal-izes inter-region difference is selected. Region-based algo-rithms differ by whether they are local or global, greedyor optimal, etc, ranging from traditional region-growingto the modern graph-theoretic segmentation using normal-ized cuts [14] and segmentation by weighted aggregation(SWA) [13].

The goal of contour-based segmentation is to group pix-els into“coherent” closed contourswhich delineate the im-age into groups of objects. This typically involves a pro-gression from local edge detection to linkedcontour frag-ments, and finally a closure of these contour fragments. Theambiguity in grouping distinct edge elements into contourfragments is typically handled in two stages, by first defin-ing an affinity between pairs of edge elements (curvilinear

1

Figure 2:[Left] Contours produced by a topology-based edge de-tector from [11] at different thresholds and aggressiveness. [Right]Regions produced by SWA Algorithm at scales 7, 8 and 9.

continuity) and secondly by selecting among those group-ing the one maximizing an overall measure; see [15] for areview. The contour fragments are then closed in a finalstep, e.g., by searching for cycles in a sparse graph repre-sentation [3, 6, 12].

A fundamental drawback in usingcoherent regionsasan intermediate representation is that the outer perimeter ofeach region serves two functionally distinct roles: portionsof the perimeter are contours in the sense that they sepa-rate two distinct objects while the remaining portions of theperimeter are simply delimiters of homogeneous patches.These delimiting contours are a result of the segmentationprocess and the competition among region fragments, notas an indicator of the intrinsic properties of the image andthe underlying objects. This subtle but rather significantdistinction can be illustrated by examining a region-basedsegmentation of the synthetic image of Figure 1 in Fig-ure 3. Observe how the mutual boundaries of some re-gion pairs are simply where the coherence between the twocannot be reconciled; these boundaries arespuriousin thatsense that they cannot possibly be indicative of apparent orinternal contours of an object. While one can minimizethese spurious contours, this is at the expense of losingsome real contours: changes in parameters controlling thecoherence measure intended to merge across such bound-

Figure 3:Fundamental problem with a region-based representa-tion: perimeter of each closed region serves two functional roles,one to denote a true contour and the other is a delimiter of the areaof coherence. Especially notice how the near uniform backgroundhas been fragmented into region fragments whose shared contours(red), as delimiters of coherence, are an artifact of the segmenta-tion algorithm and do not reflect an intrinsic image attribute. Onlythe region boundaries on the background have been highlighted forclarity. As one can see easily, the interior of the object is equallyplagued by this problem.

aries (as is usually done to deal with over-segmentation)also removes some crucial boundaries. This tradeoff be-tween over-segmented versus over-grouped segmentationis a fundamental aspect of the region-based approach andpartially derives from the dual functional roles assigned tothe perimeter of each closed region. A representation thatallows for a distinction between the two types of regionperimeters would also allow for perceptual grouping basedon both geometric continuity of the boundary as well assimilarity of the regions they bound.

A fundamental drawback in using closed contours is asimilar one: some portions of a closed contour separate twodistinct regions while other portions act as a smooth con-tinuation for the purpose of closure and connectivity, seeFigure 4. In an analogy to the region-based representa-tion, there is a trade-off in setting parameters controllingthe linking process: some parameter settings can link edgesconservatively leading to a reliable but over-fragmented setof contours embedded in numerous edges, while other set-tings edges are aggressively linked to produce long smoothcontours which can lead to erroneous linking. In either set-ting, when producing closed contours, the contours servedual purposes, one as separators of distinct regions andthe other as connectors for the sake of producing coherent(long, smooth, etc.) contours.

Ideally, region fragments should differentiate betweenthose portions of their perimeter that indicate image con-

2

Figure 4: One of the results from Figure 2 is used to illustratea fundamental problem with contour-based representation: somecontours reflect significant image contours (blue) while others arean artifact of the linking process (red). clearly, these two differ-ent types of contours serve different roles in producing a set ofcontour fragments. they must be used differently when groupingcontour fragments to form coherent wholes. The blue contours areproduced by conservative linking process wile the red contours areproduced by an aggressive linking process. A Contour-based rep-resentation does not distinguish between these two types of con-tours.

tours and those that delimit homogeneous patches. Simi-larly, contour fragments should be differentiated based onwhether they separate distinct regions or whether they aresimply connectors. This distinction has been implicit in ap-proaches which assigns roles to both regions and contours.For example in [10], a PDE for anisotropic diffusion in re-gions bounded by an edge functional is coupled with a PDEfor defining an edge-functional flanked by smooth regions.Our proposal here makes these dual roles explicit in acom-monrepresentation.

A key advantage in a common representation of regionfragments and contour fragments is the increased ability todeal with partial and ambiguous information. As an exam-ple of an image area that depicts partial contour and regionevidence, consider region D in Figure 1. Only certain por-tions of the boundary can be clearly delineated by an edgeprocess; see Figure 5 for a realistic example. In contrast,variations of intensity not related to any geometric struc-ture can produce spurious edge responses, (region C in Fig-ure 1). It would require a major leap of faith to form closedregions from these edge-based local evidence alone. Sim-ilarly, a local regional homogeneity measurement indicatesthe existence of distinct elongated regional fragments in re-gion D of Figure 1, but grouping into a coherent whole isbeyond the capabilities of a purely region-based process.The simultaneous spatial arrangement of highly salient con-

tour fragments supported by highly salient regional homo-geneity is not represented by either the contour-based andregion-based fragments alone. This deficiency motivatesour proposal for a novel type of image representation: theshock graph of a set of contour fragments represents theirspatial arrangement; and divides the space into regions in-dicated by pairs of contour fragments, Figure 6.

The contribution of this paper is in presenting a novelrepresentation for images that is based on transforming theimage coordinate system to a collection of coordinate sys-tems each defining a visual fragment. In Section 2, we for-mally define a fragment-based coordinate system so thateach point of an image belongs to a fragment and is de-scribed in its coordinate system. This maps the image into anon-overlapping collage of image fragments. We then showin Section 3 that both edge-based and region-based visualgrouping process can be represented as operations on themedial visual fragments with the clear advantage that com-bined grouping process is more selective in the presence ofambiguity.

Figure 5: The contours on the vase are well defined. However,an edge process only produces fragmented contours. The gaps arelarge enough to render contour grouping impractical to bring outall the perceived regions. These fragmented contours can only befaithfully linked if the regional information between them is alsoused via a region continuation operation.

2 Representing Images via MedialVisual Fragments

The journey from pixels to objects necessarily involves aprogressive transformation of extrinsic image coordinatesto match the intrinsic object coordinates. As a portion ofthe object is segregated from the background it must berepresented as an object fragment with its own coordinatesystem. Such an atomic object fragment consists of bound-ary fragments bounding coherent region fragments. For ex-ample, the parallel strips in region D, at the bottom of Fig-ure 1 leads to a series of broken contour fragments boundingregions that are roughly homogeneous in intensities. Theview that a medial axis segment is really just a joint rep-resentation of a pair of contours suggests that the medialsegment and its influence zone (defined by the burnt region

3

(a) (b) (c) (d)

Figure 6: (a) The shock fragment is the “influence zone” of each shock segment. Each pointP̄ in this region has a closest contourpoint P+ which in turn maps to a shock point P. Observe how part of the shock fragment perimeter is a real contour while the remainingportion is a delimiter of the region only. (b) A synthetic example showing a multitude of open contour fragments paired by shocks, (c)Shock fragments, (d) When the contours fragments are grouped, the shock fragments organize into visual fragments. The convention usedthroughout the paper is that contours are shown in blue and shocks are in red, and visual fragments are filled with a random color.

in a grassfire analogy) constitute a fragment of an image.Informally, we define avisual fragmentas the portion ofthe image in the influence zone of a shock segment arisingfrom a pair of image contours. Formally, the shock graphof a contour map partitions the image into fragments witha well-defined transformation from each image point to afragment and vice-versa. These shock fragments are theatomic fragments which are then grouped to form visualfragments.

Figure 7: This figure illustrates the coordinate system imposedby each shock fragment. Observe that the atomic shock fragmentsare immune to various visual transformations such as occlusion.

Definition 1 (Shock Fragment): In the grassfire analogyof Blum, the burnt region corresponding to each shocksegment is ashock fragment, Figure 6(a). In otherwords, the shock fragment is the union of all pairs of rays(PP+, PP−) arising from all shock points P along theshock segment.

Recall that the shock graph is a refinement of the medialaxis resulting from a sense of shock flow. Each shock pointis described by geometry (tangent and curvature) as well asdynamics (velocity and acceleration). Figure 7 shows theshock fragments of a closed curve. Observe the coordinatesystem for each shock fragment is an intrinsic object-basedcoordinate system. The proposition below shows that shockfragments when applied to non-closed curves partition theimage.

Proposition 1 An image with an associated contour map(a set of curve segments) is partitioned into a set of shock

fragments, i.e.,∀ P̄ (x, y) in the image, there exists a shocksegment k described by a curveγk parameterized by ar-clengths ∈ [0, L] with a local coordinate system of axis

tangent/normal( ~T (s), ~N(s)) and velocityv(s) such thatfor somet ∈ [0, r(s)],

(x, y) = γk(s) + t(−1v

~T ±√

v2 − 1|v|

~N),

The proof requires developing some backgroundfrom [4] so we do not present it here. Figure 6(c) illus-trates the shock fragments for a contour map sketched inFigure 6(b). Observe that a shock fragment represents anatomic fragment: when a pair of longer contour with somestructure is considered, the, the area between the two con-tours which is described by several shock fragments can begrouped, Figure 6(d), leading to the notion of a (medial)visual fragment. See also Figure 8 for additional examples.

Definition 2 (Medial Visual Fragments): A visual fragmentcorresponding to a pair of contours is the union of all shockfragments that arise from both contours.

(a) (b)

(c) (d)

Figure 8: Visual fragments formed by various arrangements ofcontour fragments are illustrated: (a) between a pair of open con-tours, (b) enclosed by a single open contour, (c) enclosed by a sin-gle closed contour and (d) enclosed by a pair of closed contours.

4

3 Reasoning with Visual Fragments

In this section we show that visual fragments offer dis-tinct advantages over region fragments and contour frag-ments. Recall that while region coherence is the main driv-ing force in forming region fragments, good continuationand good form is the main driving force in forming con-tour fragments. The presence of ambiguity in low-level fea-ture maps, mainly due to numerous visual transformations,requires that both cues be used to disambiguate the low-level evidence into a coherent whole. Figure 9 illustratesthe drawbacks of using only good continuation in a contourmap. The completion of the contour behind the occluder inFigure 9(a) and the gap in Figure 9(b) are consistent with

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 9:(a),(b) The use of contours as an intermediate level rep-resentation allows for grouping of edges across occlusions or gapsbased on good continuation. However, while this is effective inthese examples, when significant ambiguity is present, using goodcontinuation of contours is not sufficient. Rather, in matching ob-ject fragments, both boundaries must be successfully completeddepicting good “silhouette” continuity, as in (a,b,e,f), but not in(i,j). In addition they must satisfy “surface cue continuity”, orgood continuation of the interior object properties (g,h,i). Shapecontinuity consists of both silhouette continuity and surface cuecontinuity, and can be captured by shock segments representingeach “visual fragment” (j,k,l), and is used to disambiguate group-ing of visual fragments in (g-i).

the underlying object while the completion of the identicalcontours in Figure 9(c) is not; the situation in Figure 9(d)requires additional information. Clearly, completion solelybased on the contour map can be misleading.

What is lacking is the notion that not a contour but anobject fragment needs to be continued and matched on theother side of the occluder. This would imply a pair of con-tinuations. In Figure 9(a-b) one of the contours is inter-rupted while the second on is intact; in Figure 9(e-f) bothcontours need to be simultaneously completed. This formof joint contour contiuity, however, does not prevent a crossover in the completion contour, Figure 9(g), thus motivat-ing the notion ofskeletal continuity. We are proposing thatskeletal continuitycaptures object fragment continuity bet-ter than individual contour continuity.

A second aspect of object fragment continuity that is notcaptured by contour continuity is thegood continuation ofsurface cues. Consider Figures 9(i-j), where the object frag-ment on the left can be geometrically continued equally wellto either of the object fragments on the right. Clearly, goodcontinuation of the region properties is the dominant factorin deciding the grouping of fragments in this case. Thisis precisely what region-based segmentation would do ifa mechanism for crossing the occluder was somehow in-cluded.

Figure 10:Gradient Descent perceptual Grouping. From [7].

Visual fragments are capable of representing both goodskeletal continuity and surface continuity as both, pair ofcontours and the region between them, are represented, Fig-ure 9(j-l). This is more specifically pursued below wherewe consider several canonical situations and describe thegrouping process as a transformation of the underlying vi-sual fragments. This is an extension of the approach pre-sented in [7] which only considered transformations of theshock graph affecting only the contour map. In this ap-proach, the completion of a gap is cast as a well-definedtransformation of the underlying shock graph (gap trans-form), while the removal of a spurious edge element is an-other transform (loop transform). A gradient descent ap-proach selects that transformation of the shock graph thatoptimizes a move towards good form. Figure 10 illustrates

5

Figure 11:[Top] Schematic illustration: Gaps form in the contouras a result of edge detection, see Region B in Figure 1. Gap trans-form considers both the contour continuity as well as the regioncontinuity. [Middle] Grayscale example from region B in Fig-ure 1, and its visual fragments. The salience of this transform as agrouping derives from both contour continuity as well as regionalcontinuity. [Bottom] Transformed visual fragments together withaverage intensity pasted on each fragment.

this process by showing several samples along the trans-formation sequence. Ideally, all transformation sequences,or some viable subset, must be searched to select the op-timal sequence, but this is an issue that is not the focusof this paper. Here, we show that transformations of thevisual fragments can integrate both contour continuity andregional coherence,a nd thus serve as a substrate for morepowerful grouping.

Gap Transform: Consider a contour which is broken intotwo contour fragmentsC1 andC2 as in Figure 11. The no-tion of a visual fragment allows for the inclusion of bothcontour continuity and surface continuation in the comple-tion process:(i) the completion of the gap betweenC1 andC2 must satisfy good contour continuity,e.g., as defined viaElastica [2] or the Euler Spiral [8];(ii) the completion of thegap betweenC1 andC2 requires that the region fragmentsB,C,D,E and F be merged on the one side , and A,G,H,I andJ be merged on the other side of the contour. A measure ofregional continuity can be based on region-based segmen-

Figure 12:Loops form around internal contours in three differentways: (a) around a open contour (b) around a closed contour and(c) around a boundary between two fragments. [Bottom] Showsthe resulting grouped fragment after a loop transform. A looptransform moves them into another layer. They are not removed.Any texture would likewise be moved into another layer and isattached to the shape fragment as a surface property

tation methods such as segmentation based on weighted ag-gregation(SWA) [13]. Considering the inherent ambiguityin perceptual grouping the addition of regional continuityto contour continuity should provide a powerful constraintfor disambiguation of possible continuations. Other realis-tic examples are shown in Figure 16.Loop Transform: The flip side of completing across amissing contour is the removal of a “spurious” contour.Spurious contours arise from texture elements, internal con-tours, noise, among other factors. They are only spurious inthe sense that they are not likely to be a part of the bound-ary of the object fragment. A rather frequent example arisesfrom a slight surface protrusion which leads to a single, non-closed ridge contour in the image [9]. The removal of suchcontours is in fact separating it from the existing contourand moving it to another “layer”. This layer is a map at-tached to each visual fragment that forms after removingthe spurious contour. In this way, regular structural texturecan be detected and represented in this layer.

The visual fragment representation allows for an integra-tion of both contour continuity and regional continuity inthe process of determining the salience of a spurious edge.The cue for spuriousness of an edge is(i) poor continua-tion with neighboring edge elements(ii) good surface con-tinuity transversal to the spurious contour. Both measuresare computable from the visual fragments which arise fromthis contour, Figures 12 and 13. The transformation per-taining to the removal of this contour is to propagate wavesfrom the shock loop representing it so as to complete theshocks corresponding to the contour map without the spuri-ous element. The intensity in these regions is filled in using

6

Figure 13: Examples of loop transforms applied to Figure 1 atvarious locations. [Left] (a) around a open contour but signals thatthe contour is significant (b) around a closed contour where theregions around the contour can be grouped by pushing the closedregion onto another “layer” and (c) around a boundary betweentwo fragments of similar texture produced by a region-based al-gorithm. This loop suggests that the regions can be merged byremoving the common boundary.

Figure 14: Two types of occlusion transforms: [Left] One thathas support from the complementary contour and [Right] anotherthat needs to be jointly completed via “skeletal continuity”. Theremoval of the occluding object is equivalent to a loop transform.The loop is shown in yellow. Once the occluding object is re-moved, it reduces to a gap scenario. Gap transform closes the gapand fills in the texture from the participating visual fragments.

Figure 15: [Top Row] An occluded torus image and itsedges. [Middle Row] [Left] Visual Fragments produced from theedgemap. [Right] Visual fragments produced after the occluderis removed onto another “layer”. The remaining transforms aregaps transforms to link the individual visual fragments into a co-herent whole. [Bottom Row] [Left] Since there is more evidenceto link up the outer edges of the torus, this will happen first. Thegrouping of the contour fragments of the outer edge will producea grouping of the visual fragments into larger ones as illustrated.[Right] After the outer edges are grouped, there is more evidencefor the inner contour fragments to link up thus producing a pair ofclosed contours. This defines the fragment into the shape of torus,as illustrated.

the recent exemplar-based filling-in process [1]. The looptransform is not restricted solely to the removal of spuriouscontours. When certain object fragments have formed, theycan be moved to another layer as they have been occludinganother object. For example, in Figure 14, once the occlud-ing object is segregated it can be moved to another layer,and complete theregion fragmentin the area behind it by aloop transform of the visual fragments, leaving behind gapsin the process. This is based on the observation that per-ceptually the occluding edge “belongs” to the occluder [5].

7

(a) (e) (i)

(b) (f) (j)

(c) (g) (k)

(d) (h) (l)

Figure 16:Some examples of the visual fragment transforms ap-plied to the fruits basket image [Top]. [Left] Gap Transforms ex-ample1 [Middle] Gap Transform Example2. [Right] Loop trans-form example.

4 Conclusion

The perceptual organization of an image from pixels to ob-jects uses many intermediate representations. At a firststage a local grouping of pixels are described by low-levelfeature maps consisting of edge and regions. At at secondstage, these form visual fragments which are initially nu-merous atomic shock fragments but as grouping proceeds,

they resemble object fragments and describe object partsand their 3D structure. Finally, whole objects are segregatedby reasoning about its parts. This paper has proposed an in-termediate representation of the image which spans the gapfrom low-level image descriptors to high-level object parts.The proposed visual fragment encode both edge-based andregion-based properties, thus enabling a grouping processto simultaneously take advantage of both cues, which canpotentially disambiguate grouping ambiguities.

References[1] A. A. Efros and T. K. Leung. Texture synthesis by non-

parametric sampling. InIEEE International Conference onComputer Vision, pages 1033–1038, Corfu, Greece, Septem-ber 1999.

[2] R. B. Eitan Sharon, Achi Brandt. Completion energies andscale.PAMI, 22(10):1117–1131, 2000.

[3] J. Elder and S. Zucker. Local scale control for edge detectionand blur estimation. InECCV96, pages II:57–69, 1996.

[4]

[5] B. Gillam. New evidence for ”closure” in perception.Per-ception and Psychophysics, 17(5):521–524, 1975.

[6] D. W. Jacobs. Robust and efficient detection of salient con-vex groups.IEEE Trans. Pattern Analysis and Machine In-telligence, 18(1):23–37, 1996.

[7]

[8]

[9] J. J. Koenderink and A. J. van Doorn. The shape of smoothobjects and the way contours end.Perception, 11:129–137,1982.

[10] M. Proesmans, E. Pauwels, and L. V. Gool. Coupledgeometry-driven diffusion equations for low-level vision.In Geometry-Driven Diffusion in Computer Vision. Kluwer,1994.

[11] C. Rothwell, J. Mundy, W. Hoffman, and V.-D. Nguyen.Driving vision by topology. InIEEE Intl. Symosium on Com-puter Vision, pages 395–400, 1995.

[12] K. T. S. Mahamud, L.R. Williams and K. Xu. Segmenta-tion of multiple salient closed contours from real images.IEEE Trans. on Pattern Analysis and Machine Intelligence,25(4):433–444, 2003.

[13] E. Sharon, A. Brandt, and R. Basri. Segmentation andBoundary Detection Using Multiscale Intensity Measure-ments. InIEEE Proceedings IEEE Conference on ComputerVision and Pattern Recognition, pages 469–476, 2001.

[14] J. Shi and J. Malik. Normalized cuts and image segmenta-tion. IEEE Transactions on PAMI, 22(8):888–905, 2000.

[15] L. Williams and K. Thornber. A comparison of measuresfor detecting natural shapes in cluttered backgrounds.IJCV,34(2-3):81–96, November 1999.

8

Medial Visual Fragments as an Intermediate Image Representation for Segmentation...

Documents

Transcript of Medial Visual Fragments as an Intermediate Image Representation for Segmentation...