Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object...

59
Object-Based and Image-Based Object Representations HANAN SAMET University of Maryland An overview is presented of object-based and image-based representations of objects by their interiors. The representations are distinguished by the manner in which they can be used to answer two fundamental queries in database applications: (1) Feature query: given an object, determine its constituent cells (i.e., their locations in space). (2) Location query: given a cell (i.e., a location in space), determine the identity of the object (or objects) of which it is a member as well as the remaining constituent cells of the object (or objects). Regardless of the representation that is used, the generation of responses to the feature and location queries is facilitated by building an index (i.e., the result of a sort) either on the objects or on their locations in space, and implementing it using an access structure that correlates the objects with the locations. Assuming the presence of an access structure, implicit (i.e., image-based) representations are described that are good for finding the objects associated with a particular location or cell (i.e., the location query), while requiring that all cells be examined when determining the locations associated with a particular object (i.e., the feature query). In contrast, explicit (i.e., object-based) representations are good for the feature query, while requiring that all objects be examined when trying to respond to the location query. The goal is to be able to answer both types of queries with one representation and without possibly having to examine every cell. Representations are presented that achieve this goal by imposing containment hierarchies on either space (i.e., the cells in the space in which the objects are found), or objects. In the former case, space is aggregated into successively larger-sized chunks (i.e., blocks), while in the latter, objects are aggregated into successively larger groups (in terms of the number of objects that they contain). The former is applicable to image-based interior-based representations of which the space pyramid is an example. The latter is applicable to object-based interior-based representations of which the R-tree is an example. The actual mechanics of many of these representations are demonstrated in the VASCO JAVA applets found at http://www.cs.umd.edu/~hjs/quadtree/index.html. Categories and Subject Descriptors: E.1 [Data Structures]: Trees; E.2 [Data Storage Representations]: Object representation; E.5 [Files]: Sorting/searching; H.2.4 [Database Management]: Systems—Query processing, multimedia databases; H.2.8 [Database Management]: Database Applications—Image databases, spatial databases and GIS; I.2.10 [Artificial Intelligence]: Vision and Scene Understanding—Representations, data structures, and transforms; I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Object hierarchies;I.3.6 The support of the National Science Foundation under Grants EIA-99-00268, IIS-00-86162, and EIA-00- 91474, and Microsoft Research is gratefully acknowledged. Author’s address: Computer Science Department, Center for Automation Research, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742; email: [email protected]; Web: www.cs. umd.edu/ hjs. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 2004 ACM 0360-0300/04/0600-0159 $5.00 ACM Computing Surveys, Vol. 36, No. 2, June 2004, pp. 159–217.

Transcript of Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object...

Page 1: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations

HANAN SAMET

University of Maryland

An overview is presented of object-based and image-based representations of objects bytheir interiors. The representations are distinguished by the manner in which they canbe used to answer two fundamental queries in database applications: (1) Feature query:given an object, determine its constituent cells (i.e., their locations in space). (2)Location query: given a cell (i.e., a location in space), determine the identity of theobject (or objects) of which it is a member as well as the remaining constituent cells ofthe object (or objects). Regardless of the representation that is used, the generation ofresponses to the feature and location queries is facilitated by building an index (i.e., theresult of a sort) either on the objects or on their locations in space, and implementing itusing an access structure that correlates the objects with the locations. Assuming thepresence of an access structure, implicit (i.e., image-based) representations aredescribed that are good for finding the objects associated with a particular location orcell (i.e., the location query), while requiring that all cells be examined whendetermining the locations associated with a particular object (i.e., the feature query). Incontrast, explicit (i.e., object-based) representations are good for the feature query,while requiring that all objects be examined when trying to respond to the locationquery. The goal is to be able to answer both types of queries with one representationand without possibly having to examine every cell. Representations are presented thatachieve this goal by imposing containment hierarchies on either space (i.e., the cells inthe space in which the objects are found), or objects. In the former case, space isaggregated into successively larger-sized chunks (i.e., blocks), while in the latter, objectsare aggregated into successively larger groups (in terms of the number of objects thatthey contain). The former is applicable to image-based interior-based representations ofwhich the space pyramid is an example. The latter is applicable to object-basedinterior-based representations of which the R-tree is an example. The actual mechanicsof many of these representations are demonstrated in the VASCO JAVA applets foundat http://www.cs.umd.edu/~hjs/quadtree/index.html.

Categories and Subject Descriptors: E.1 [Data Structures]: Trees; E.2 [Data StorageRepresentations]: Object representation; E.5 [Files]: Sorting/searching; H.2.4[Database Management]: Systems—Query processing, multimedia databases; H.2.8[Database Management]: Database Applications—Image databases, spatialdatabases and GIS; I.2.10 [Artificial Intelligence]: Vision and SceneUnderstanding—Representations, data structures, and transforms; I.3.5 [ComputerGraphics]: Computational Geometry and Object Modeling—Object hierarchies; I.3.6

The support of the National Science Foundation under Grants EIA-99-00268, IIS-00-86162, and EIA-00-91474, and Microsoft Research is gratefully acknowledged.Author’s address: Computer Science Department, Center for Automation Research, Institute for AdvancedComputer Studies, University of Maryland, College Park, MD 20742; email: [email protected]; Web: www.cs.umd.edu/∼hjs.Permission to make digital/hard copy of part or all of this work for personal or classroom use is grantedwithout fee provided that the copies are not made or distributed for profit or commercial advantage, thecopyright notice, the title of the publication, and its date appear, and notice is given that copying is bypermission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requiresprior specific permission and/or a fee.c©2004 ACM 0360-0300/04/0600-0159 $5.00

ACM Computing Surveys, Vol. 36, No. 2, June 2004, pp. 159–217.

Page 2: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

160 H. Samet

[Computer Graphics]: Methodology and Techniques—Graphics data structures anddata types; I.4.10 [Image processing and Computer Vision]:imageRepresentation—Hierarchical, multidimensional

General Terms: Algorithms, Theory

Additional Key Words and Phrases: Access methods, feature query, geographic informationsystems (GIS), image space, location query, object space, octrees, pyramids, quadtrees,R-trees, space-filling curves, spatial databases

1. INTRODUCTION

The representation of spatial objects andtheir environment is an important issue inbuilding and maintaining databases (e.g.,Samet [1995]) to support applicationsin computer graphics, game program-ming, computer vision, image processing,robotics, pattern recognition, computa-tional geometry, and geographic informa-tion systems (GIS). In this survey, ourgoal is to introduce practitioners and re-searchers in these areas to these repre-sentations. Bearing these goals in mind,whenever there are several ways of ex-plaining a concept we use the terminologyand notation that is common to these fieldsrather than that which is more commonlyused in spatial databases (e.g., Rigauxet al. [2001] and Shekhar and Chawla[2003]). Thus the survey is not necessar-ily aimed at database researchers (as isfor example [Di Pasquale et al. 2003]) al-though we hope that they will also findit useful. Please note also that we arefocussing on the representation of spa-tial objects which means that the objectshave extent (e.g., Arman and Aggarwal[1993], Samet [1990a, 1990b], Suetenset al. [1992]) rather than being merelypoints. The representation of multidimen-sional points has been much studied inthe database literature (e.g., Bohm et al.[2001], Gaede [1995], and Samet [1990b]).

We assume that the objects areconnected1 although their environmentneed not be. The objects and their en-vironment are usually decomposed intocollections of more primitive elements

1Intuitively, this means that a d -dimensional objectcannot be decomposed into disjoint subobjects so thatthe subobjects are not adjacent in a (d − 1)-dimen-sional sense.

(termed cells) each of which has a locationin space, a size, and a shape. These ele-ments can either be subobjects of varyingshape (e.g., a table consists of a flat topin the form of a rectangle and four legs inthe form of rods whose lengths dominatetheir cross-sectional areas), or can havea uniform shape. The former yields anobject-based decomposition while thelatter yields an image-based or cell-baseddecomposition. Another way of charac-terizing these two decompositions is thatthe former decomposes the objects whilethe latter decomposes the environment inwhich the objects lie. This distinction iscommonly used to characterize algorithmsin computer graphics (e.g., Foley et al.[1990]).

Each of the decompositions has its ad-vantages and disadvantages. They dependprimarily on the nature of the queries thatare posed to the database. The most gen-eral queries ask where, what, who, why,and how. The ones that are relevant to ourapplication are where and what. They arestated more formally as follows:

(1) Feature query: Given an object, deter-mine its constituent cells (i.e., their lo-cations in space).

(2) Location query: Given a cell (i.e., a lo-cation in space), determine the identityof the object (or objects) of which it is amember as well as the remaining con-stituent cells of the object (or objects).

Not surprisingly, the queries can be clas-sified using the same terminology that weused in the characterization of the decom-position. In particular, we can either try tofind the cells (i.e., their locations in space)occupied by an object or find the objectsthat overlap a cell (i.e., a location in space).If objects are associated with cells so that

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 3: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 161

a cell contains the identity of the relevantobject (or objects), then the feature queryis analogous to retrieval by contents whilethe location query is analogous to retrievalby location.

The feature and location queries arethe basis of two more general classes ofqueries. In particular, the feature queryis a member of a broader class of queriesdescribed collectively as being feature-based (also object-based), while the loca-tion query is a member of a broader classof queries described collectively as beinglocation-based (also image-based or cell-based). The location query is also com-monly referred to as a point query indatabases (e.g., Knuth [1998]), the pointlocation problem in computational ge-ometry (e.g., de Berg et al. [1997] andPreparata and Shamos [1985]), and a“pick” operation in computer graphics(e.g., Foley et al. [1990]) which is actuallyused to find the nearest object to a given lo-cation. The class of location-based queriesinclude the numerous variants of the win-dow query which retrieves the objects thatcover an arbitrary region (often rectangu-lar). All of these queries are used in sev-eral applications including geographic in-formation systems (e.g., Aref and Samet[1990] and Samet and Aref [1995]) andspatial data mining (e.g., Wang et al.[1997]). It is important to note that theonly reason that we discuss these par-ticular queries is that they are the mo-tivation for the different representationsthat we present. Of course, there are manyother possible queries such as a more gen-eralized formulation of neighbor finding,Boolean set operations, spatial joins, etc.However, their discussion is beyond thescope of this survey.

The most common representation ofthe objects and their environment is asa collection of cells of uniform size andshape (termed pixels and voxels in twoand three dimensions, respectively) allof whose boundaries (with dimensional-ity one less than that of the cells) are ofunit size. Such a representation is usedin many applications in image process-ing, computer graphics, remote sensing,and geographic information systems (GIS)

Fig. 1. Example collection ofthree objects and the cells thatthey occupy.

where it is known as a raster representa-tion. In fact, it forms the basis of the mapalgebra system of Tomlin [1990] which hasbeen implemented in a number of GIS sys-tems (e.g., ARC/INFO). Since the cells areuniform, there exists a way of referring totheir locations in space relative to a fixedreference point (e.g., the origin of the coor-dinate system). An example of a location ofa cell in space is a set of coordinate valuesthat enable us to find it in the d -dimen-sional space of the environment in whichit lies. Note that the concept of the loca-tion of a cell in space is quite different fromthat of the address of a cell, which is thephysical location (e.g., in memory, on disk,etc.), if any, where some of the informationassociated with the cell is stored. This dis-tinction between the location in space of acell and the address of a cell is importantand we shall make use of it often.

In most applications, the boundaries(i.e., edges and faces in two and threedimensions, respectively) of the cells areparallel to the coordinate axes. In our dis-cussion, we assume that the cells com-prising a particular object are contigu-ous (i.e., adjacent), and that a differentunique value is associated with each dis-tinct object, thereby enabling us to distin-guish between the objects. Depending onthe underlying representation, this valuemay be stored with the cells. For example,Figure 1 contains three two-dimensionalobjects A, B, and C and their correspond-ing cells. Note that in this example thereexist two hyperplanes that will separate

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 4: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

162 H. Samet

the objects; however, this is not crucial toour discussion. We also observe that, al-though not the case in this example, ob-jects are allowed to overlap which meansthat a cell may be associated with morethan one object. Here we assume, withoutloss of generality, that the volume of theoverlap must be an integer multiple of thevolume of a cell (i.e., pixels, voxels, etc.).

The shape of an object o can be rep-resented either by the interiors of thecells comprising o, or by the subset ofthe boundaries of those cells comprisingo that are adjacent to the boundary of o.In particular, interior-based methods rep-resent an object o by using the locations inspace of the cells that comprise o, whileboundary-based methods represent o byusing the locations in space of the cellsthat are adjacent to the boundary of o.In general, interior-based representationsmake it very easy to calculate properties ofan object such as its mass, and, dependingon the nature of the aggregation process,to determine the value associated with anypoint (i.e., location) in the space coveredby a cell in the object. On the other hand,boundary-based representations make iteasy to obtain the boundary of an object.The first boundary representation was thechain code [Freeman 1974], which is a dig-itized version of a vector representation.Other more commonly used boundary rep-resentations are those known collectivelyas the boundary model (also referred to asBRep) among which are included a num-ber of variants of the winged-edge datastructure (e.g., Baumgart [1975], Guibasand Stolfi [1985], Joy et al. [2003], andSamet [1990b]), as well as representationssuch as the BSP tree (denoting BinarySpace Partitioning) [Fuchs et al. 1980],which is mentioned briefly in Section 3. Inthis article, the focus is on interior-basedrepresentations and thus further discus-sion of these representations is beyond ourscope.

The simplest representation representsthe objects by use of collections of unit-sizecells. The collection can be represented ei-ther explicitly or implicitly. The represen-tation is explicit if the identities of all thecontiguous cells that form the object are

hardwired into the representation (char-acterized as being object-based), while therepresentation is implicit if we can onlydetermine the cells that make up the ob-ject by examining the contiguous cells of agiven cell and determining if they are asso-ciated with the same object (characterizedas being image-based).

These representations can be mademore compact by aggregating similarelements (i.e., unit-size cells). Theseelements are usually identically valuedcontiguous cells, or even objects which,ideally, are in proximity. The result is thatthe cells that make up the object collectionno longer need to be of unit size and theirsizes can vary. The varying-sized cells aretermed blocks.

Regardless of the representation that isused, the generation of responses to thefeature and location queries is facilitatedby building an index (i.e., the result of asort) either on the objects or on their lo-cations in space, and implementing it us-ing an access structure that correlates theobjects with the locations. Assuming thepresence of an access structure, the im-plicit (i.e., image-based) representationsdescribed in Sections 2 and 3 are goodfor finding the objects associated with aparticular location or cell (i.e., the loca-tion query), while requiring that all cellsbe examined when determining the loca-tions associated with a particular object(i.e., the feature query). In contrast, theexplicit (i.e., object-based) representationsdescribed in Sections 2 and 3 are good forthe feature query, while requiring that allobjects be examined when trying to re-spond to the location query. Our goal isto be able to answer both types of querieswith one representation and without pos-sibly having to examine every cell. This isthe main focus of this article.

We achieve our goal by imposing con-tainment hierarchies on the representa-tions. The hierarchies differ depending onwhether the hierarchy is of space (i.e., thecells in the space in which the objects arefound), or of objects. In the former case, weaggregate space into successively larger-sized chunks (i.e., blocks), while in the lat-ter, we aggregate objects into successively

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 5: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 163

larger groups (in terms of the number ofobjects that they contain). The former isapplicable to implicit (i.e., image-based)interior-based representations, while thelatter is applicable to explicit (i.e., object-based) interior-based representations.

The basic idea is that in image-basedrepresentations, we propagate objects upthe hierarchy, with the occupied space be-ing implicit to the representation. Thus,we retain the property that associatedwith each cell is an identifier indicatingthe object of which it is a part. In fact, itis this information that is propagated upthe hierarchy so that each element in thehierarchy contains the union of the objectsthat appear in the elements immediatelybelow it.

On the other hand, in the object-basedrepresentations, we propagate the spaceoccupied by the objects up the hierarchy,with the identities of the objects beingimplicit to the representation. Thus, weretain the property that associated witheach object is a set of locations in spacecorresponding to the cells that make upthe object. Actually, since this informa-tion may be rather voluminous, it is of-ten the case that an approximation of thespace occupied by the object is propagatedup the hierarchy rather than the collec-tion of individual cells that are spannedby the object. The approximation is usu-ally the minimum bounding box for theobject that is customarily stored with theexplicit representation. Therefore, associ-ated with each element in the hierarchy isa bounding box corresponding to the unionof the bounding boxes associated withthe elements immediately below it. Thebounding box is quite general in the sensethat it can be used to approximate all typesof data rather than just objects with axis-parallel boundaries as in this article.

The use of the bounding box approxima-tion has the drawback that the boundingboxes at a given level in the hierarchy arenot necessarily disjoint, which means thatresponding to the location query may re-quire all of the bounding boxes to be visitedas the space spanned by an object may beincluded in several bounding boxes; how-ever, the object is only associated with one

of the bounding boxes. This can be over-come by decomposing the bounding boxesso that disjointness holds. The drawbackof this solution is that an object may beassociated with more than one boundingbox, which may result in the object beingreported as satisfying a particular querymore than once. For example, suppose thatwe want to retrieve all the objects thatoverlap a particular region (i.e., a windowquery) rather than a point as is done in thelocation query.

It is very important to note that thepresence of the hierarchy does not meanthat the alternative query (i.e., the fea-ture query in the case of a space hierar-chy and the location query in the case ofan object hierarchy) can be answered im-mediately. Instead, obtaining the answerusually requires that the hierarchy be de-scended. The effect is that the order of theexecution time needed to obtain the an-swer is reduced from linear to logarithmic.Of course, this is not always the case. Forexample, the fact that we are using bound-ing boxes for the space spanned by the ob-jects rather than the exact space occupiedby them means that we do not always havea complete answer when reaching the bot-tom of the hierarchy. In particular, at thispoint, we may have to resort to a more ex-pensive point-in-polygon test [Foley et al.1990].

It is worth repeating that the only rea-son for imposing the hierarchy is to facili-tate responding to the alternative query(i.e., the feature query in the case of aspace hierarchy on the implicit represen-tation, and the location query in the case ofan object hierarchy on the explicit repre-sentation). Thus the base representationof the hierarchy is still usually used toanswer the original query, because often,when using the hierarchy, the inherentlylogarithmic overhead incurred by the needto descend the hierarchy may be too ex-pensive (e.g., when using the implicit rep-resentation with an array access struc-ture to respond to the location query).Of course, other considerations such asspace requirements may cause us to mod-ify the base representation of the hierar-chy, with the result that it will take longer

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 6: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

164 H. Samet

to respond to the original query (e.g., theuse of a tree-like access structure with animplicit representation). Nevertheless, asa general rule, in the case of the space hier-archy, we use the implicit representation(which is the basis of this hierarchy) to an-swer the location query, while in the caseof the object hierarchy, we use the explicitrepresentation (which is the basis of thishierarchy) to answer the feature query.

The rest of this article is organized asfollows: Section 2 examines both image-based and object-based representationsthat consist of collections of unit-size cells,while Section 3 reviews how these rep-resentations are made more compact byaggregating similar elements into blocks.Sections 4 and 5 describe how to modifythe image-based and object-based repre-sentations, respectively, to be hierarchical.Section 6 briefly discusses some disjointobject-based representations, while con-cluding remarks are drawn in Section 7.Note that all of the representations thatwe discuss can be used in a dynamicenvironment although some more eas-ily in the sense that updates are lesscostly to process as less of the rep-resentation needs to be rebuilt in thecase of an update. The actual mechan-ics of many of these representations aredemonstrated in the VASCO JAVA appletsfound at http://www.cs.umd.edu/~hjs/quadtree/index.html [Brabec et al. 2003].

In order to simplify matters, the repre-sentations described in Sections 2 and 3assume that the objects can be decom-posed into cells whose boundaries are par-allel to the coordinate axes. Moreover, it isassumed that each unit-size cell or block iscontained entirely in one or more objects—that is, a cell or block cannot be partiallycontained in two objects. This means thateither each cell in a block belongs to thesame object or objects, or all of the cells inthe block do not belong to any of the ob-jects. Of course, as we pointed out before,more complex objects are possible (e.g., ar-bitrary polyhedra as well as collectionsof objects whose boundaries do not coin-cide with the boundaries of the underly-ing blocks) and also a cell or a block couldbe allowed to overlap several objects with-

out being completely contained in them.In this case, the hierarchical object-basedrepresentations which are presented inSection 5 are the most appropriate and thediscussion therein is applicable.

2. UNIT-SIZE CELLS

Interior-based representations aggregateidentically-valued cells by recording theirlocations in space. When the aggregationis explicit, the identities of the contigu-ous cells that form the object are hard-wired into the representation. An exampleof an explicit aggregation is one that asso-ciates a set with each object o that con-tains the location in space of each cell thatcomprises o. In this case, no identifyinginformation (e.g., the object identifier cor-responding to o) is stored in the cells. Thusthere is no need to allocate storage forthe cells (i.e., no addresses are associatedwith them). One possible implementationof this set is a list. For example, considerFigure 1 and assume that the origin (0,0)is at the upper-left corner. Assume furtherthat this is also the location of the pixelthat abuts this corner. Therefore, the ex-plicit representation of object B is the set oflocations {(5,1) (6,0) (6,1) (7,0) (7,1)}. It shouldbe clear that using the explicit represen-tation, given an object o, it is easy to de-termine the cells (i.e., locations in space)that comprise it (the feature query).

Of course, even when using an explicitrepresentation, we must still be able to ac-cess object o from a possibly large collec-tion of objects, which may require an ad-ditional data structure such as an indexon the objects (e.g., a table of object-valuepairs where value indicates the entry inthe explicit representation correspondingto object). This index does not make useof the spatial coverage of the objects andthus may be implemented using conven-tional searching techniques such as hash-ing [Knuth 1998]. In this case, we willneed O(N ) additional space for the in-dex, where N is the number of differentobjects. We do not discuss such indexeshere.

The fact that no identifying informa-tion as to the nature of the object is

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 7: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 165

stored in the cell means that the explicitrepresentation is not suited for answer-ing the inverse query of determining theobject associated with a particular cellat location l in space (i.e., the locationquery). Using the explicit representation,the location query can be answered onlyby checking for the presence of locationl in space in the various sets associatedwith the different objects. This will betime-consuming, as it may require thatwe examine all cells in each set. In otherwords, the explicit representation is pri-marily suited to retrieval on the basis ofknowledge of the objects rather than of thelocations of the cells in space and this isthe rationale for characterizing it as beingobject-based.

Note that since the explicit representa-tion consists of sets, there is no partic-ular order for the cells within each setalthough an ordering could be imposedbased on spatial proximity of the loca-tions of the cells in space, etc. For exam-ple, the list representation of a set alreadypresupposes the existence of an ordering.Such an ordering could be used to obtaina small, but not insignificant, decrease inthe time (in an expected sense) needed toanswer the location query. In particular,now whenever cell c is not associated withobject o, we will be able to cease search-ing the list associated with o after havinginspected half of the cells associated witho instead of all of them, which is the casewhen no ordering exists.

An important shortcoming of the use ofthe explicit representation, which has aneffect somewhat related to the absence ofan ordering, is the inability to distinguishbetween occupied and unoccupied cells. Inparticular, in order to detect that a cell cis not occupied by any object we must ex-amine the sets associated with each object,which is quite time-consuming. Of course,we could avoid this problem by forming anadditional set which contains all of the un-occupied cells, and examine this set firstwhenever processing the location query.The drawback of such a solution is thatit slows down all instances of the locationquery that involve cells that are occupiedby objects.

We can avoid examining every cell ineach object set, thereby speeding up thelocation query in certain cases, by stor-ing a simple approximation of the objectwith each object set o. This approximationshould be of a nature that makes it easy tocheck if it is impossible for a location l inspace to be in o. One such approximationis a minimum bounding box whose sidesare parallel to the coordinate axes of thespace in which the object is embedded. Forexample, for object B in Figure 1 such a boxis anchored at the lower-left corner of cell(5,1) and the upper-right corner of cell (7,0).The existence of a box b for object o meansthat if b does not contain l , then o does notcontain l either, and we can proceed withchecking the other objects. This bound-ing box is usually a part of the explicitrepresentation.

There are several ways of increasing thequality of the approximation. For exam-ple, the minimum bounding box may berotated by an arbitrary angle so that thesides are still orthogonal while no longerhaving to be parallel to the coordinateaxes and known as an OBB-tree denotingoriented bounding box (e.g., Brinkhoffet al. [1994], Gottschalk et al. [1996], andReddy and Rubin [1978]). These repre-sentations adaptations of the strip tree[Ballard 1981] and the Douglas–Peuckergeneralization algorithm [Douglas 1990;Douglas and Peucker 1973; Saalfeld 1987]for curves in two dimensions. The numberof sides as well as the number of theirpossible orientations may be expandedso that it is arbitrary (e.g., a convexhull [Brinkhoff et al. 1994]), or boundedalthough greater than the dimensionalityof the underlying space (e.g., the P-tree[Jagadish 1990b] and k-DOP [Klosowskiet al. 1998] where the number of pos-sible orientations is bounded, and theminimum bounding polybox [Brodskyet al. 1995], which attempts to find theoptimal orientations). The most generalsolution is the convex hull, which is oftenapproximated by a minimum bound-ing polygon of a fixed number of sideshaving either an arbitrary orientation(e.g., the minimum bounding n-corner[Dori and Ben-Bassat 1983; Schiwietz

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 8: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

166 H. Samet

1993; Schiwietz and Kriegel 1993]) or afixed orientation usually parallel to thecoordinate axes (e.g., [Esperanca andSamet 1997]). Restricting the sides (i.e.,faces in dimension higher than 2) of thepolyhedron to be parallel to the coordinateaxes (termed an axis-parallel polygon)enables simpler point-in-object tests (e.g.,the vertex representation [Esperanca andSamet 1994; Esperanca 1995; Esperancaand Samet 1997]). The minimum bound-ing box may also be replaced by a circle,sphere (e.g., Hubbard [1995, 1996],Omohundro [1989], van Oosterom [1990],van Oosterom and Claassen [1990], andWhite and Jain [1996b]), ellipse, in-tersection of the minimum boundingbox with the minimum boundingsphere (e.g., Katayama and Satoh[1997]), as well as other shapes (e.g.,Brinkhoff et al. [1994]). Interestingly,many of these solutions arose in ap-plications in collision detection (e.g.,Gottschalk et al. [1996] and Klosowskiet al. [1998]).

In the rest of this article, we restrictour discussion to minimum boundingboxes that are rectangles with sides par-allel to the coordinate axes, although, ofcourse, the techniques we describe areapplicable to other more general bound-ing objects. Nevertheless, in the interestof brevity, we often use the term bound-ing box even though the terms minimumbounding box or minimum bounding ob-ject would be more appropriate.

The location query can be answeredmore directly if we allocate an address ain storage for each cell c where an identi-fier is stored that indicates the identity ofthe object (or objects) of which c is a mem-ber. Recall that such a representation issaid to be implicit as in order to deter-mine the rest of the cells that comprisethe object associated with c (and thus com-plete the response to the location query),we must examine the identifiers storedin the addresses associated with the con-tiguous cells and then aggregate the cellswhose associated identifiers are the same.However, in order to be able to use the im-plicit representation, we must have a wayof finding the address a corresponding to

c, taking into account that there is possi-bly a very large number of cells, and thenretrieving a.

Finding the right address requires anadditional data structure, termed an ac-cess structure, such as an index on thelocations in space. An example of suchan index is a table of cell-address pairswhere address indicates the physical loca-tion where the information about the ob-ject associated with the location in spacecorresponding to cell is stored. The tableis indexed by the location in space corre-sponding to cell. The index is really an or-dering and hence its range is usually theintegers (i.e., one-dimensional). When thedata is multidimensional (i.e., cells in d -dimensional space where d > 0), it maynot be convenient to use the location inspace corresponding to the cell as an indexsince its range spans data in several di-mensions. Instead, we employ techniquessuch as laying out the addresses corre-sponding to the locations in space of thecells in some particular order and thenmaking use of an access structure in theform of a mapping function to enable thequick association of addresses with the lo-cations in space corresponding to the cells.Retrieving the address is more complex inthe sense that it can be a simple memoryaccess or it may involve an access to sec-ondary or tertiary storage if virtual mem-ory is being used. In most of our discus-sion, we assume that all data is in mainmemory, although, as we will see, sev-eral representations do not rely on thisassumption.

Such an access structure enables us toobtain the contiguous cells (as we knowtheir locations in space) without having toexamine all of the cells. Therefore, we willknow the identities of the cells that com-prise an object thereby enabling us to com-plete the response to the location querywith an implicit representation. In otherwords, the implicit representation lends it-self to retrieval on the basis of knowledgeonly of the cells rather than of the objects,and this is the rationale for characteriz-ing it as being image-based. In the rest ofthis section, we discuss several such accessstructures.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 9: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 167

Fig. 2. The result of applying several different space-ordering methods toan 8×8 collection of cells whose first element is in the upper-left corner: (a)row order, (b) row-prime order, (c) Morton order, (d) Peano–Hilbert order, (e)Cantor-diagonal order, (f) spiral order, (g) Gray order, (h) double Gray order,and (i) U order.

The existence of an access structure alsoenables us to answer the feature querywith the implicit representation, althoughthis is quite inefficient. In particular, givenan object o, we must exhaustively exam-ine every cell (i.e., location l in space) andcheck if the address where the informa-tion about the object associated with l isstored contains o as its value. This will betime-consuming, as it may require that weexamine all the cells.

There are many ways of laying out theaddresses corresponding to the locationsin space of the cells each having its ownmapping function. Some of the most im-portant ones for a two-dimensional spaceare illustrated in Figure 2 for an 8×8 por-

tion of the space and are described brieflybelow. To repeat, in essence, what we aredoing is providing a mapping from thed -dimensional space containing the loca-tions of the cells to the one-dimensionalspace of the range of index values (i.e.,integers) which are used to access a ta-ble whose entries contain the addresseswhere information about the contents ofthe cells is stored. The result is an order-ing of the space, and the curves shown inFigure 2 are termed space-filling curves(e.g., Sagan [1994]). Choosing among thespace-filling curves illustrated in Figure 2is not easy as each one has its advantagesand disadvantages. Below, we review afew of their desirable properties, and show

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 10: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

168 H. Samet

how some of the two-dimensional order-ings satisfy them.

—The curve should pass through each lo-cation in space once and only once.

—The mapping from the higher-dimen-sional space to the integers shouldbe relatively simple and likewise forthe inverse mapping. This is the casefor all but the Peano-Hilbert order(Figure 2(d)). For the Morton order(Figure 2(c)), the mapping is obtainedby interleaving the binary representa-tions of the coordinate values of thelocation of the cell. The number as-sociated with each cell is known asits Morton number. The Gray order(Figure 2(g)) is obtained by applying aGray code [Gray 1953] to the result ofbit interleaving, while the double Grayorder (Figure 2(h)) is obtained by apply-ing a Gray code to the result of bit in-terleaving the Gray code of the binaryrepresentation of the coordinate values.The U order (Figure 2(i)) is obtained in asimilar manner to the Z order except foran intermediate application of d −1 ‘ex-clusive OR’ (⊕) operations on the binaryrepresentation of selected combinationsof the coordinate values prior to the ap-plication of bit interleaving [Liu andSchrack 1998; Schrack and Liu 1995].Thus the difference in cost between theZ order and the U order in d dimensionsis just the performance of additionald−1 ‘exclusive OR’ operations. This is incontrast with the Peano–Hilbert orderwhere the mapping and inverse map-ping processes are considerably morecomplex.

—The ordering should be stable. Thismeans that the relative ordering ofthe individual locations is preservedwhen the resolution is doubled (e.g.,when the size of the two-dimensionalspace in which the cells are embed-ded grows from 8 × 8 to 16 × 16) orhalved assuming that the origin staysthe same. The Morton, U, Gray, and dou-ble Gray orders are stable, while the row(Figure 2(a)), row-prime (Figure 2(b)),Cantor-diagonal (Figure 2(e)), and spi-ral (Figure 2(f)) orders are not stable.

Fig. 3. Peano–Hilbert curves of resolution (a) 1,(b) 2, and (c) 3.

The Peano–Hilbert order is also not sta-ble as can be seen by its definition.In particular, in two dimensions, thePeano–Hilbert order of resolution i + 1(i.e., a 2i × 2i image) is constructed bytaking the Peano–Hilbert curve of reso-lution i and rotating the NW, NE, SE, andSW quadrants by 90 degrees clockwise,0 degrees, 0 degrees, and 90 degreescounterclockwise, respectively. For ex-ample, Figures 3(a), 3(b), and 3(c) givethe Peano–Hilbert curves of resolutions1, 2, and 3, respectively.

—Two locations that are adjacent (i.e., inthe sense of a (d − 1)-dimensional adja-cency also known as 4-adjacent) in spaceare neighbors along the curve and viceversa. In two dimensions, this meansthat the locations share an edge or aside. This is impossible to satisfy forall locations at all space sizes. How-ever, for the row-prime, Peano–Hilbert,and spiral orders, every element is a 4-adjacent neighbor of the previous ele-ment in the sequence while this is notthe case for the other orders. This meansthat the row-prime, Peano–Hilbert, andspiral orders have a slightly higher de-gree of locality than the other orders.

—The process of retrieving the neighborsof a location in space should be simple.

—The order should be admissible. Thismeans that at each position in the or-dering, at least one 4-adjacent neigh-bor in each of the lateral directions (i.e.,horizontal and vertical) must have al-ready been encountered. This is usefulin several algorithms (e.g., connectedcomponent labeling [Dillencourt et al.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 11: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 169

1992]2). The row and Morton ordersare admissible while the Peano–Hilbert,U, Gray, and double Gray orders arenot admissible. The row-prime, Cantor-diagonal, and spiral orders are admis-sible only if we permit the direction ofthe 4-adjacent neighbors to vary fromposition to position along the curve. Forexample, for the row-prime order, at po-sitions on odd rows, the previously en-countered 4-adjacent neighbors are thewestern and northern neighbors, whileat positions on even rows, it is the east-ern and northern neighbors.

The row order (Figure 2(a)) is of spe-cial interest to us because its mappingfunction is the one most frequently usedby the multidimensional array, which isthe most common access structure. TheMorton order [Morton 1966] has a longhistory having been first mentioned byPeano [1890], and has been used by manyresearchers (e.g., Abel and Smith [1983],Gargantini [1982], Orenstein and Merrett[1984], and White [1982]). It is also knownas a Z order [Orenstein and Merrett 1984]and as an N order [White 1982]. ThePeano–Hilbert order was first mentionedsoon afterwards by Hilbert [1891], and hasalso been used by a number of researchers(e.g., Faloutsos and Roseman [1989] andJagadish [1990a]).

Although conceptually very simple, theU order introduced by Schrack andLiu [Liu and Schrack 1998; Schrack andLiu 1995] is relatively recent. It is a vari-ant of the Morton order, while also resem-bling the Peano–Hilbert order. The prim-itive shape is a ‘U’ which is the same asthat of the Peano–Hilbert order. However,unlike the Peano–Hilbert order, and likethe Morton order, the ordering is applied

2A region or object four-connected component, is amaximal four-connected set of locations belonging tothe same object, where a set S of locations is said tobe four-connected if for any locations p, q in S thereexists a sequence of locations p = p0, p1, . . . , pn =q in S, such that pi+1 is 4-adjacent (8-adjacent) topi , 0 ≤ i < n. The process of assigning the samelabel to all 4-adjacent locations that belong to thesame object is called connected component labeling(e.g., Park and Rosenfeld [1971] and Rosenfeld andPfaltz [1966]).

recursively with no rotation thereby en-abling it to be stable. The U order has aslight advantage over the Morton orderin that more of the locations that are ad-jacent (i.e., in the sense of a (d − 1)-di-mensional adjacency) along the curve arealso neighbors in space. This is directly re-flected in the lower average distance be-tween two successive positions in the or-der (i.e., for a 216 × 216 image, it is 1.4387for the U order while it is 2.0000, 1.6724,and 1.5000 for the double Gray, Morton,and Gray orders, respectively, and 1 for therow-prime, Peano–Hilbert, and spiral or-ders [Samet 2005]). However, the price ofthis is that like the Peano–Hilbert order,the U order is also not admissible. Nev-ertheless, like the Morton order, the pro-cess of retrieving the neighbors of a loca-tion in space is simple when the space isordered according to the U order. Asanoet al. [1997] describe an order which hasthe same properties as the Peano–Hilbertorder except that for any square region,there are at most three breaks in the con-tinuity of the curve in contrast to fourfor the Peano–Hilbert order (i.e., at mostthree out of four 4-adjacent subblocks of asquare block are not neighbors along thecurve in contrast with a possibility that allfour 4-adjacent subblocks are not neigh-bors in the Peano–Hilbert order). Thisproperty is useful for retrieval of squarelike regions in the case of a range querywhen the data is stored on disk in this or-der as each break in the continuity can re-sult in a disk seek operation.

The multidimensional array (having adimension equal to the dimensionality ofthe space in which the objects and theenvironment are embedded) is an accessstructure which, given a cell c at a locationl in space, enables us to calculate the ad-dress a containing the identifier of the ob-ject associated with c. The array is only aconceptual multidimensional structure (itis not a multidimensional physical entityin memory) in the sense that it is a map-ping of the locations in space of the cellsinto sequential addresses in memory. Theactual addresses are obtained by the ar-ray access function (see e.g., Knuth [1997]as well as the above discussion on space

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 12: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

170 H. Samet

orderings) which is based on the extentsof the various dimensions (i.e., coordinateaxes). The array access function is usuallythe mapping function for the row order(Figure 2(a)). Thus, the array enables usto implement the implicit representationwith no additional storage except for whatis needed for the array’s descriptor. The de-scriptor contains the bounds and extentsof each of the dimensions which are usedto define the mapping function (i.e., theydetermine the values of its coefficients) sothat the appropriate address can be calcu-lated given the cell’s location in space.

The array is called a random accessstructure because the address associatedwith a location in space can be retrieved inconstant time independent of the numberof elements in the array and does not re-quire any search. Note that we could storethe object identifier o in the array elementitself instead of allocating a separate ad-dress a for o thereby saving some space.

The array is an implicit representationbecause we have not explicitly aggregatedall the contiguous cells that comprise aparticular object. They can be obtainedgiven a particular cell c at a location lin space belonging to object o by recur-sively accessing the array elements cor-responding to the locations in space thatare adjacent to l and checking if theyare associated with object o. This processis known as depth-first connected compo-nent labeling.

Interestingly, depth-first connectedcomponent labeling could also be usedto answer the feature query efficientlywith an implicit representation if we adda data structure such as an index on theobjects (e.g., a table of object-location pairswhere location is one of the locations inspace that comprise object). Thus given anobject o we use the index to find a locationin space that is part of o, and then proceedwith the depth-first connected componentlabeling as before. This index does notmake use of the spatial coverage of theobjects and thus it can be implementedusing conventional searching techniquessuch as hashing [Knuth 1998]. In thiscase, we will need O(N ) additional spacefor the index, where N is the number of

different objects. We do not discuss suchindexes here.

Of course, we could also answer the loca-tion query with an explicit representationby adding an index which associates ob-jects with locations in space (i.e., havingthe form location-objects). However, thiswould require O(S) additional space forthe index, where S is the number of cells.The O(S) bound assumes that only oneobject is associated with each cell. If wetake into account that a cell could be as-sociated with more than one object, thenthe additional storage needed is O(NS), ifwe assume N objects. Since the numberof cells S is usually much greater than thenumber of objects N , the addition of an in-dex to the explicit representation is not aspractical as extending the implicit repre-sentation with an index of the form object-location as described above. Thus, it wouldappear that the implicit representation ismore useful from the point of view of flex-ibility when taking storage requirementsin to account.

The implicit representation can be im-plemented with access structures otherthan the array. This is an important con-sideration when many of the cells are notin any of the objects (i.e., they are empty).The problem is that using the array accessstructure is wasteful of storage, as the ar-ray requires an element for each cell re-gardless of whether the cell is associatedwith any of the objects. In this case, wechoose to keep track of only the nonemptycells.

We have two ways to proceed. The first isto use one of several multidimensional ac-cess structures such as a point quadtree,k-d tree, MX quadtree, etc. as describedin Samet [1990b]. The second is to makeuse of one of the orderings of space shownin Figure 2 to obtain a mapping from thenonempty contiguous cells to the integers.The result of the mapping serves as theindex in one of the familiar tree-like ac-cess structures (e.g., binary search tree,range tree, B+-tree, etc.) to store the ad-dress which indicates the physical locationwhere the information about the object as-sociated with the location in space corre-sponding to the nonempty cell is stored.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 13: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 171

3. BLOCKS

An alternative class of representationsof the objects and their environment re-moves the stipulation that cells makingup the object collection be of a unit sizeand permits their sizes to vary. The result-ing cells are termed blocks and are usu-ally rectangular with sides parallel to thecoordinate axes (this is assumed in ourdiscussion unless explicitly stated other-wise). The volume (e.g., area in two dimen-sions) of the blocks need not be an integermultiple of that of the unit-size cells, al-though this is often the case. Observe thatwhen the volumes of the blocks are inte-ger multiples of that of the unit-size cells,then we have two levels of aggregation inthe sense that an object consists of an ag-gregation of blocks which are themselvesaggregations of cells. We assume that allthe cells in a block belong to the same ob-ject or objects. In other words, the situa-tion that some of the cells in the block be-long to object o1 while the others belong toobject o2 (and not to o1) is not allowed.

The collection of blocks is usually a re-sult of a space decomposition process witha set of rules that guide it. There are manypossible decompositions. When the decom-position is recursive, we have the situationthat the decomposition occurs in stagesand often, although not always, the resultsof the stages form a containment hierar-chy. This means that a block b obtained instage i is decomposed into a set of blocks bjthat span the same space. Blocks bj are, inturn, decomposed in stage i + 1 using thesame decomposition rule. Some decompo-sition rules restrict the possible sizes andshapes of the blocks as well as their place-ment in space. Some examples include:

—congruent blocks at each stage—similar blocks at all stages—all but one side of a block are unit-sized—all sides of a block are of equal size—all sides of each block are powers of two—etc.

Other decomposition rules dispense withthe requirement that the blocks be rectan-gular, while still others do not require that

they be orthogonal. In addition, the blocksmay be disjoint or be allowed to overlap.Clearly, the choice is large. In the follow-ing, we briefly explore some of these de-composition processes.

The simplest decomposition rule is onethat permits aggregation of identically-valued cells in only one dimension. It as-signs a priority ordering to the variousdimensions and then fixes the coordinatevalues of all but one of the dimensions, sayi, and then varies the value of the ith co-ordinate and aggregates all adjacent cellsbelonging to the same object into a one-dimensional block. This technique is com-monly used in image processing applica-tions where the image is decomposed intorows which are scanned from top to bot-tom, and each row is scanned from leftto right while aggregating all adjacentpixels with the same value into a block.It is useful in image transmission as weonly have to transmit the pixels wherea change in value takes place. The ag-gregation into one-dimensional blocks isthe basis of runlength encoding [Rutovitz1968]. Similar techniques are applicableto higher-dimensional data where, for ex-ample in the case of three-dimensionaldata, one would scan the image one 2-dimensional hyperplane at a time whereeach hyperplane would be scanned inraster scan order. Techniques analogousto runlength encoding form the basis ofthe vertex representation for represent-ing axis-parallel polygons of arbitrarydimension [Esperanca and Samet 1994;Esperanca 1995].

The drawback of the decomposition intoone-dimensional blocks described above isthat all but one side of each block mustbe of unit width. The most general decom-position removes this restriction along allof the dimensions, thereby permitting ag-gregation along all dimensions. In otherwords, the decomposition is arbitrary. Theblocks need not be uniform or similar. Theonly requirement is that the blocks spanthe space of the environment. This gen-eral decomposition has the potential ofrequiring less space. However, its draw-back is that the determination of optimalpartition points may be a computationally

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 14: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

172 H. Samet

Fig. 4. Arbitrary block decom-position for the collection of ob-jects and cells in Figure 1. Blockscorresponding to object O are la-beled Oi and the blocks that arenot in any of the objects as Wi us-ing the suffix i to distinguish be-tween them in both cases.

expensive procedure. We assume that theblocks are disjoint although this need notbe the case. We also assume that the blocksare rectangular as well as orthogonal (e.g.,Figure 4). although again this is not abso-lutely necessary as there exist decomposi-tions using other shapes as well (e.g., tri-angles, etc.).

It is easy to adapt the explicit represen-tation to deal with blocks resulting froman arbitrary decomposition (which also in-cludes the one that yields one-dimensionalblocks). In particular, instead of associat-ing a set with each object o that containsthe location in space of each cell that com-prises o, we need to associate with eachobject o the locations in space and size ofeach block that comprises o. This can bedone by specifying the coordinate valuesof the upper-left corner of each block andthe sizes of its sides. Without loss of gen-erality, we use this format for the explicitrepresentation of all of the block decompo-sitions described in this section.

Using the explicit representation ofblocks, both the feature and locationqueries are answered in essentially thesame way as they were for unit-sized cells.The only difference is that for the locationquery instead of checking if a particularlocation l in space is a member of one ofthe sets of cells associated with the vari-ous objects, we must check if l is covered

by one of the blocks in the sets of blocks ofthe various objects. This is a fairly simpleprocess as we know the location in spaceand size of each of the blocks.

Implementing an arbitrary decomposi-tion (which also includes the one that re-sults in one-dimensional blocks) using animplicit representation is also quite easy.We build an index based on an easily iden-tifiable location in each block such as itsupper-left corner. We make use of the sametechniques that were presented in the dis-cussion of the implicit representation forunit-sized cells in Section 2. The only dif-ference is that we must also record the sizeof each block along with the address indi-cating the physical location where the in-formation about the object associated withthe locations in space corresponding to theblock is stored.

As in the case of unit-size cells, regard-less of which access structure is used toimplement the index, we determine theobject o associated with a cell at locationl by finding the block b that covers l . Ifb is an empty block, then we exit. Oth-erwise, we return the object o associatedwith b. Notice that the search for the blockthat covers l may be quite complex in thesense that the access structures may notnecessarily achieve as much pruning ofthe search space as in the case of unit-sized cells. In particular, this is the casewhenever the space ordering and the blockdecomposition method to whose resultsthe ordering is being applied do not havethe property that all of the cells in eachblock appear in consecutive order. In otherwords, given the cells in the block e withminimum and maximum values in the or-dering, say u and v, there exists at leastone cell in block f distinct from e whichis mapped to a value w where u < w < v.Thus, supposing that the index is imple-mented using a tree-like access structure,a search for the block b that covers l mayrequire that we visit several subtrees of aparticular node in the tree.

As we saw in the description of the algo-rithm for responding to query 2, the draw-back of the arbitrary decomposition intoblocks is that since there is no rule for theformation of the blocks, there is also no

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 15: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 173

Fig. 5. (a) Block decomposition resulting from the imposition of a grid with partitionlines at arbitrary positions on the collection of objects and cells in Figure 1 yielding anirregular grid, (b) the array access structure, (c) the linear scale for the x coordinatevalues, and (d) the linear scale for the y coordinate values. Blocks corresponding toobject O are labeled Oi and the blocks that are not in any of the objects as Wi using thesuffix i to distinguish between them in both cases.

easy rule for accessing them. The irregulargrid is one way to overcome this drawbackby making use of a very simple decomposi-tion rule that partitions a d -dimensionalspace having coordinate axes xi into d -di-mensional blocks by use of hi hyperplanesthat are parallel to the hyperplane formedby xi = 0 (1 ≤ i ≤ d ). The result is a col-lection of

∏di=1(hi +1) blocks. These blocks

form a grid of irregular-sized blocks as thepartition lines are at arbitrary positionsin contrast to the uniform grid [Franklin1984] where the partition lines are po-sitioned so that all of the resulting gridcells are congruent. Observe that there isno recursion involved in the decomposi-tion process. For example, Figure 5(a) isan example block decomposition using hy-perplanes parallel to the x and y axes forthe collection of objects and cells given inFigure 1.

The block decomposition resulting fromthe use of an irregular grid is handled byan explicit representation in the same wayas the arbitrary decomposition. Findinga suitable implicit representation is a bitmore complex as we must define an ap-propriate access structure. Although theblocks are not congruent, we can still im-pose an array access structure on them byadding d access structures termed linearscales. The linear scales indicate the posi-

tion of the partitioning hyperplanes thatare parallel to the hyperplane formed byxi = 0 (1 ≤ i ≤ d ). Thus given a locationl in space, say (a,b) in two-dimensionalspace, the linear scales for the x and ycoordinate values indicate the column androw, respectively, of the array access struc-ture entry which corresponds to the blockthat contains l .

For example, Figure 5(b) is the ar-ray access structure corresponding to theblock decomposition in Figure 5(a), whileFigures 5(c) and 5(d) are the linear scalesfor the x and y axes, respectively. In thisexample, the linear scales are shown as ta-bles (i.e., array access structures). In fact,they can be implemented using tree accessstructures. The representation describedhere is an adaptation for regions of thegrid file [Nievergelt et al. 1984] data struc-ture for points.

Our implementation of the access struc-tures for the irregular grid yields a repre-sentation that is analogous to an indirectuniform grid in the sense that given a cellat location l we need to make d + 1 array-like accesses (analogous to the two mem-ory references involved with indirect ad-dressing in computer instruction formats)to obtain the object o associated with itinstead of just one array access when thegrid is uniform (i.e., all the blocks are

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 16: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

174 H. Samet

Fig. 6. (a) Block decomposition and (b) its tree representation for the collection ofobjects and cells in Figure 1. Blocks corresponding to object O are labeled Oi and theblocks that are not in any of the objects as Wi using the suffix i to distinguish betweenthem in both cases.

congruent and cell-sized). The first d ac-cesses find the identity of the array ele-ment (i.e., block b) that contains l , whilethe last access determines the object o as-sociated with b. Once we have found blockb, we examine the adjacent blocks to ob-tain the rest of the cells comprising ob-ject o, thereby completing the response tothe location query, by employing the samemethods as we used for the array accessstructure for the uniform-sized cells. Theonly difference is that every time we finda block b in the array access structure as-sociated with o, we must examine b’s cor-responding entries in the linear scales todetermine b’s size so that we can report thecells that comprise b as parts of object o.

Perhaps the most widely known decom-positions into blocks are those referredto by the general terms quadtree and oc-tree [Samet 1990a; Samet 1990b]. Theyare usually used to describe a class ofrepresentations for two and three-dimen-sional data (and higher as well), respec-tively, that are the result of a recursivedecomposition of the environment (i.e.,space) containing the objects into blocks(not necessarily rectangular) until thedata in each block satisfies some condi-tion (e.g., with respect to its size, the na-ture of the objects that comprise it, thenumber of objects in it, etc.). The positionsand/or sizes of the blocks may be restrictedor arbitrary. It is interesting to note thatquadtrees and octrees may be used withboth interior-based and boundary-based

representations. Moreover, both explicitand implicit aggregations of the blocks arepossible.

There are many variants of quadtreesand octrees, and they are used in numer-ous application areas including high en-ergy physics, VLSI, finite element analy-sis, and many others. Below, we focus onregion quadtrees [Klinger 1971] and re-gion octrees [Hunter 1978; Meagher 1982].They are specific examples of interior-based representations for two and three-dimensional region data (variants for dataof higher dimension also exist), respec-tively, that permit further aggregation ofidentically-valued cells.

Region quadtrees and region octrees areinstances of a restricted-decompositionrule where the environment containingthe objects is recursively decomposed intofour or eight, respectively, rectangularcongruent blocks until each block is ei-ther completely occupied by an object oris empty (such a decomposition process istermed regular). For example, Figure 6(a)is the block decomposition for the regionquadtree corresponding to Figure 1. No-tice that in this case, all the blocks aresquare, have sides whose size is a powerof 2, and are located at specific positions.In particular, assuming an origin at theupper-left corner of the image correspond-ing to the environment containing the ob-jects, then the coordinate values of theupper-left corner of each block (e.g., (i, j )in two dimensions) of size 2s × 2s satisfy

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 17: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 175

the property that a mod 2s = 0 and b mod2s = 0.

A region quadtree can be implementedusing an explicit representation by associ-ating a set with each object o that containsits constituent blocks. Each block is speci-fied by numbers corresponding to the coor-dinate values of its upper-left corner andthe size of one of its sides. These numbersare stored in the set in the form (i, j ) : kwhere (i, j ) and k correspond to the coor-dinate values of the upper-left corner anddepth, respectively, of the block. For ex-ample, the explicit representation of thecollection of blocks in Figure 1 is givenby the sets A = {(0,0):2, (0,4):0, (1,4):0, (2,4):0,(3,4):0}, B = {(5,1):0, (6,0):1}, and C = {(2,6):1,(4,6):1, (6,6):1}, which correspond to blocks{A1,A2,A3,A4,A5}, {B1,B2}, and {C1,C2,C3}, re-spectively.

A region quadtree implementation thatmakes use of an implicit representation isquite different. First, we allocate an ad-dress a in storage for each block b whichstores an identifier that indicates the iden-tity of the object (or objects) of which b isa member. Second, it is necessary to im-pose an access structure on the collectionof blocks in the same way as the arraywas imposed on the collection of unit-sizedcells. Such an access structure enablesus to determine easily the value associ-ated with any point in the space coveredby a cell without resorting to exhaustivesearch. Note that depending on the natureof the access structure, it’s not always nec-essary to store the location and size of eachblock with a.

There are many possible access struc-tures. Interestingly, using an array as anaccess structure is not particularly usefulas it defeats the rationale for the aggrega-tion of cells into blocks unless, of course, allthe blocks are of a uniform size in whichcase we have the analog of a two-level grid.

The traditional, and most natural, ac-cess structure for a region quadtree cor-responding to a d -dimensional image is atree with a fanout of 2d (e.g., Figure 6(b)corresponding to the collection of two-dimensional objects in Figure 1 whosequadtree block decomposition is given inFigure 6(a)). Each leaf node in the tree

corresponds to a different block b and con-tains the address a in storage where anidentifier is stored that indicates the iden-tity of the object (or objects) of which b is amember. As in the case of the array, wherewe could store the object identifier o in thearray element itself instead of allocating aseparate address a for o, we could achievethe same savings by storing o in the leafnode of the tree. Each nonleaf node f cor-responds to a block whose volume is theunion of the blocks corresponding to the2d children of f . In this case, the tree isa containment hierarchy and closely par-allels the decomposition in the sense thatthey are both recursive processes and theblocks corresponding to nodes at differentdepths of the tree are similar in shape.

Answering the location query using thetree structure is different from using anarray where it is usually achieved by atable lookup having an O(1) cost (unlessthe array is implemented as a tree, whichis a possibility [DeMillo et al. 1978]). Incontrast, the location query is usually an-swered in a tree by locating the block thatcontains the location in space correspond-ing to the desired cell. This is achievedby a process that starts at the root of thetree and traverses the links to the chil-dren whose corresponding blocks containthe desired location. This process has anO(n + F ) cost where the environment hasa maximum of n levels of subdivision (e.g.,an environment all of whose sides are oflength 2n), and F is the cardinality of theanswer set.

Using a tree with fanout 2d as an ac-cess structure for a regular decompositionmeans that there is no need to record thesize and location of the blocks. This infor-mation can be inferred from knowledge ofthe size of the underlying space as the 2d

blocks that result from each subdivisionstep are congruent. For example, in twodimensions, each level of the tree corre-sponds to a quartering process that yieldsfour congruent blocks (rectangular here,although a triangular decomposition pro-cess could also be defined which yields fourequilateral triangles; however, in such acase, we are no longer dealing with rect-angular cells). Thus as long as we start

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 18: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

176 H. Samet

from the root, we know the location andsize of every block.

There are a number of alternative ac-cess structures to the tree with fanout2d . They are all based on finding a map-ping from the domain of the blocks to asubset of the integers (i.e., to one dimen-sion) and then applying one of the fa-miliar tree-like access structures (e.g., abinary search tree, range tree, B+-tree,etc.). There are many possible mappings(e.g., [Samet 1990a]). The simplest is touse the same technique that we applied tothe collection of blocks of arbitrary size. Inparticular, we can apply one of the order-ings of space shown in Figure 2 to obtaina mapping from the coordinate values ofthe upper-left corner u of each block to theintegers.

Since the size of each block b in the re-gion quadtree can be specified with a sin-gle number indicating the depth in thetree at which b is found, we can simplifythe representation by incorporating thesize into the mapping. One mapping sim-ply concatenates the result of interleav-ing the binary representations of the co-ordinate values of the upper-left corner(e.g., (a, b) in two dimensions) and i of eachblock of size 2i so that i is at the right. Theresulting number is termed a locationalcode and is a variant of the Morton order(Figure 2(c)). Assuming such a mappingand sorting the locational codes in increas-ing order yields an ordering equivalent tothat which would be obtained by travers-ing the leaf nodes (i.e., blocks) of the treerepresentation (e.g., Figure 6(b)) in the or-der NW, NE, SW, SE.

As the dimensionality of the space (i.e.,d ) increases, each level of decompositionin the region quadtree results in manynew blocks as the fanout value 2d is high.In particular, it is too large for a prac-tical implementation of the tree accessstructure. In this case, an access struc-ture termed a bintree [Knowlton 1980;Samet and Tamminen 1988; Tamminen1984] with a fanout value of 2 is used. Thebintree is defined in a manner analogousto the region quadtree except that at eachsubdivision stage, the space is decomposedinto two equal-sized parts. In two dimen-

sions, at odd stages we partition along they axis and at even stages we partitionalong the x axis. Of course, in d dimen-sions, the depth of the tree may increaseby a factor of d .

The region quadtree, as well as thebintree, is a regular decomposition. Thismeans that the blocks are congruent—that is, at each level of decomposition, allof the resulting blocks are of the sameshape and size. We can also use decom-positions where the sizes of the blocksare not restricted in the sense that theonly restriction is that they be rectangularand be a result of a recursive decomposi-tion process. In this case, the representa-tions that we described must be modifiedso that the sizes of the individual blockscan be obtained. An example of such astructure is an adaptation of the pointquadtree [Finkel and Bentley 1974] to re-gions. Although the point quadtree was de-signed to represent points in a higher di-mensional space, the blocks resulting fromits use to decompose space do correspondto regions. The difference from the regionquadtree is that in the point quadtree, thepositions of the partitions are arbitrary,whereas they are a result of a partition-ing process into 2d congruent blocks (e.g.,quartering in two dimensions) in the caseof the region quadtree.

As the dimensionality d of the spaceincreases, each level of decomposition inthe point quadtree results in many newblocks since the fanout value 2d is high.In particular, it is too large for a practicalimplementation of the tree access struc-ture. Therefore, we use a k-d tree [Bentley1975] which is an access structure havinga fanout of 2 that has the same relation-ship to the point quadtree as the bintreehas to the region quadtree. As in the pointquadtree, although the k-d tree was de-signed to represent points in a higher di-mensional space, the blocks resulting fromits use to decompose space do correspondto regions. In other words, the bintree is aregular decomposition k-d tree.

The k-d tree can be further generalizedso that the partitions take place on thevarious axes at an arbitrary order, and,in fact, the partitions need not be made

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 19: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 177

on every coordinate axis. In this case, ateach nonleaf node of the k-d tree, we mustalso record the identity of the axis thatis being split. We use the term general-ized k-d tree to describe this structure. Thegeneralized k-d tree is really an adapta-tion to regions of the adaptive k-d tree[Friedman et al. 1977] and the LSDtree [Henrich et al. 1989] which were orig-inally developed for points. It can alsobe regarded as a special case of the BSPtree (denoting Binary Space Partition-ing) [Fuchs et al. 1980]. In particular, inthe generalized k-d tree, the partitioninghyperplanes are restricted to be parallelto the axes, whereas in the BSP tree theyhave an arbitrary orientation. The BSPtree is used in computer graphics to facil-itate viewing.

One of the shortcomings of the general-ized k-d tree is the fact that we can onlydecompose the space into two parts alonga particular dimension at each step. If wewish to partition a space into p parts alonga dimension i, then we must perform p−1successive partitions on dimension i. Oncethese p−1 partitions are complete, we par-tition along another dimension. The puz-zletree [Dengel 1991] is a further gener-alization of the k-d tree that decomposesthe space into two or more parts along aparticular dimension at each step so thatno two successive partitions use the samedimension. In other words, the puzzletreecompresses all successive partitions on thesame dimension in the generalized k-dtree.

The puzzletree is motivated by a de-sire to overcome the rigidity in the shape,size, and position of the blocks that re-sult from the bintree (and to an equivalentextent, the region quadtree) partitioningprocess (because of its regular decompo-sition). In particular, in many cases, thedecomposition rules ignore the homogene-ity present in certain regions on accountof the need to place the partition lines inparticular positions as well as a possiblelimit on the number of permissible par-titions along each dimension at each de-composition step. Often, it is desirable forthe block decomposition to follow the per-ceptual characteristics of the objects as

well as reflect their dominant structuralfeatures.

For example, consider a front view of ascene containing a table and two chairs.Figures 7(a) and 7(b) are the block decom-positions resulting from the use of a bin-tree and a puzzletree, respectively, for thisscene, while Figure 7(c) is the tree accessstructure corresponding to the puzzletreein Figure 7(b). Notice the natural decom-position in the puzzletree of the chair intothe legs, seat, and back, and of the ta-ble into the top and legs. On the otherhand, the blocks in the bintree (and to agreater extent in the region quadtree, al-though not shown here) do not have thisperceptual coherence. Of course, we areaided here by the separability of the ob-jects; however, this does not detract fromthe utility of the representation as it onlymeans that the objects can be decomposedinto fewer parts.

4. IMAGE-BASED HIERARCHICALINTERIOR-BASED REPRESENTATIONS(PYRAMIDS)

Our goal here is to be able to take an ob-ject o as input and return the cells thatit occupies (the feature query) when us-ing a representation that stores with eachcell the identities of the objects of whichit is a part. The most natural hierarchythat can be imposed on the cells to enableus to answer this query is one that aggre-gates every q cells regardless of the valuesassociated with them into larger congru-ent blocks (unlike the aggregation of iden-tically valued cells into multidimensionalblocks as in the region quadtree). This pro-cess is repeated recursively so that groupsof q blocks are repeatedly aggregated intoone block until there is just one block left.The value associated with the block b is theunion of the names (i.e., object identifiers)of the objects associated with the cells orblocks that comprise block b. The identi-ties of the cells and blocks that are ag-gregated depends, in part, on how thecollection of the cells is represented. Forexample, assuming a two-dimensional un-derlying space, if the cells are representedas one long list consisting of the cells of

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 20: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

178 H. Samet

Fig. 7. Block decomposition for the (a) bintree and (b) puzzletree corre-sponding to the front view of a scene containing a table and two chairs;(c) is the tree access structure for the puzzletree in (b).

the first row, followed by those of the sec-ond row, etc., then one possible aggrega-tion combines every successive q cells. Inthis case, the blocks are really one-dimen-sional entities.

The process that we have just outlinedcan be described more formally as follows.We make the following assumptions:

—The blocks are rectangular with sidesparallel to the coordinate axes.

—Each block contains q cells or q blocksso that, assuming d dimensions, q =∏d

j=1 r j where the block has width r j

for dimension j (1 ≤ j ≤ d ) measuredin cells or blocks depending on the levelin the hierarchy at which the block isfound.

—All blocks at a particular level in the hi-erarchy are congruent with the differentlevels forming a containment hierarchy.

—There are S cells in the underlyingspace, and let n be the smallest powerof q such qn ≥ S.

—The underlying space can be enlargedby adding L empty cells so that qn =

S + L and that each side of the un-derlying space along dimension j is ofwidth r j

n.

The hierarchy consists of the set of sets{Ci} (0 ≤ i ≤ n) where Cn correspondsto the original collection of cells havingS+L elements, Cn−1 contains (S + L)/q el-ements corresponding to the result of theinitial aggregation of q cells into (S + L)/qcongruent blocks, and C0 is a set consist-ing of just one element corresponding toa block of size S + L. Each element e ofCi (0 ≤ i ≤ n − 1) is a congruent blockwhose value is the union of the values (i.e.,sets of object identifiers) associated withthe blocks of the q elements of Ci+1. Thevalue of each element of Cn is the objectidentifier(s) corresponding to the object(s)of which its cell is a part.

The resulting hierarchy is known asa cell pyramid (e.g., Aref and Samet[1990], Burt [1980], Burton et al. [1984],Dyer [1981, 1982], Ichikawa [1981], Millerand Stout [1985], Rosenfeld [1980, 1984],Shaffer and Samet [1987], Srihari [1984],Solntseff and Wood [1977], and Tanimoto

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 21: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 179

and Pavlidis [1975]3 and is frequentlycharacterized as a multiresolution repre-sentation since the original collection ofobjects is described at several levels of de-tail by using cells that have different sizes,although similar in shape. It is importantto distinguish the cell pyramid from the re-gion quadtree which, as we recall, is an ex-ample of an aggregation into square blockswhere the basis of the aggregation is thatthe cells have identical values (i.e., are as-sociated with the same object, or objectsif object overlap is permitted). The regionquadtree is an instance of what is termed avariable-resolution representation, which,of course, is not limited to blocks that aresquare. In particular, it can be used with alimited number of nonrectangular shapes(most notably, triangles in two dimensions[Bell et al. 1983; Samet 1990b]).

It is quite difficult to use the cell pyra-mid, in the form that we have described,to respond to the feature query and to thecomplete location query (i.e., to obtain allof the contiguous cells that make up theobject associated with the query location)due to the absence of an access structure.This can be remedied by implementing aset of arrays Ai in a one-to-one correspon-dence to Ci (0 ≤ i ≤ n) where Ai is a d -dimensional array of side length r j

i fordimension j (1 ≤ j ≤ d ). Each of theelements of Ai corresponds to a d -dimen-sional block of side length r j

n−i for dimen-sion j (1 ≤ j ≤ d ) assuming a total under-lying space of side length r j

n. The result isa stack of arrays Ai, termed an array pyra-mid, which serves as an access structure tocollections Ci (0 ≤ i ≤ n). The array pyra-mid is an instance of an implicit interior-based representation consisting of arrayaccess structures. Of course, other repre-sentations are possible through the use ofalternative access structures (e.g., differ-ent types of trees).

We illustrate the array pyramid for twodimensions with r1 = 2 and r2 = 2. As-sume that the space in which the original

3Actually, the qualifier cell is rarely used. However,we use it here to avoid confusion with other variantsof the pyramid which are based on a hierarchy ofobjects rather than cells as discussed in Section 5.

Fig. 8. Array pyramid for the collection of objectsand cells in Figure 1. (a) Array A2. (b) Array A1.(c) Array A0. The block decomposition in Figure 1corresponds to Array A3.

collection of cells is found is of size 2n ×2n.Let Cn correspond to the original collectionof cells. The hierarchy of arrays consistsof the sequence Ai (0 ≤ i ≤ n) so thatelements of Ai access the correspondingelements in Ci. We obtain Cn−1 by form-ing an array of size 2n−1 × 2n−1 with 22n−2

elements so that each element e in Cn−1corresponds to a 2×2 square consisting of4 elements (i.e., cells) in Cn and has a valueconsisting of the union of the names (i.e.,labels) of the objects that are associatedwith these 4 cells. This process is appliedrecursively to form Ci (0 ≤ i ≤ n−1) whereC0 is a collection consisting of just one el-ement whose value is the set of namesof all the objects associated with at leastone cell. The arrays are assumed to bestored in memory using sequential allo-cation with conventional orderings (e.g.,lexicographically), and are accessed by useof the d -dimensional coordinate values ofthe cells. For example, Figure 8 is the ar-ray pyramid for the collection of objects inFigure 1.

Using the array pyramid, it is very easyto respond to the feature query, as we justexamine the relevant parts of the stack ofarrays. For example, suppose that we wantto determine the locations that compriseobject o, and we use the array pyramidconsisting of arrays Ai (0 ≤ i ≤ n) in a two-dimensional space of size 2n×2n where theblocks are squares of side length 2n−i. Westart with A0, which consists of just oneelement e, and determine if o is a mem-ber of the set of values associated with e.If it is not, then we exit and the answeris negative. If it is, then we examine thefour elements in A1 that correspond to eand repeat the test. At this point, we know

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 22: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

180 H. Samet

Fig. 9. Cell-tree pyramid for the collection of objects and cells in Figure 1.

that o is a member of at least one of themas otherwise o could not have been a mem-ber of the set of values associated with ele-ment e of A0. This process is applied recur-sively to elements of Aj that contained o(i.e., the appropriate elements of Aj+1 areexamined for 1 ≤ j ≤ n − 1) until encoun-tering An at which time the process stops.The advantage of this method is that el-ements of Aj+1 are not examined unlessobject o is guaranteed to be a member ofthe set of values associated with at leastone of them.

The array pyramid uses a sequence ofarrays as an access structure. An alterna-tive implementation is one that imposesan access structure in the form of a tree Ton the elements of the hierarchy {Ci}. Onepossible implementation is a tree of fanoutq where the root T0 corresponds to C0,nodes {Tij} at depth i to Ci (1 ≤ i ≤ n − 1),while the leaf nodes {Tnj} correspond toCn. In particular, element t in the tree atdepth j corresponds to element e of Cj(0 ≤ j ≤ n − 1) and t contains q point-ers to its q children in Tj+1 correspondingto the elements of Cj+1 that are containedin e. The result is termed a cell-tree pyra-mid. Figure 9 shows the cell-tree pyramidcorresponding to the collection of objectsin Figure 1 where the cells are labeled asin Figure 6(a). This example makes useof two-dimensional data with r1 = 2 andr2 = 2. In this case, notice the similaritybetween the cell-tree pyramid and the re-

gion quadtree implementation that usesan access structure which is a tree with afanout of 4 (Figure 6(b)).

Using the term quadtree in its mostgeneral sense (i.e., d -dimensional blockswhose sides need not be powers of twonor be of the same length), the cell-treepyramid can be viewed as a completequadtree (i.e., where no aggregation takesplace at the deepest level, or, equivalently,all leaf nodes with no children are atthe maximum depth of the tree). Never-theless, there are some very importantdifferences. The first difference, as wepointed out before, is that the quadtree is avariable-resolution representation, whilethe cell-tree pyramid is a multiresolutionrepresentation. The second, and most im-portant, difference is that in the case ofthe quadtree, the nonleaf nodes serve onlyas an access structure. They do not in-clude any information about the objectspresent in the nodes and cells below them.This is why the quadtree, like the array, isnot useful for answering the feature query.Of course, we could also devise a vari-ant of the quadtree (termed a truncated-tree pyramid [Samet 1995]) which usesthe nonleaf nodes to store informationabout the objects present in the cells andnodes below them (e.g., Figure 10). Notethat both the cell-tree pyramid and thetruncated-tree pyramid are instances ofan implicit interior-based representationwith a tree access structure.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 23: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 181

Fig. 10. Truncated-tree pyramid for the collection of objects and cellsin Figure 1.

Our definition of the pyramid was madein a bottom-up manner in the sense thatwe started with a block size and an under-lying space size. Next, we expanded thesize of the underlying space so that a con-tainment hierarchy of congruent blocks ateach level and similar blocks at differentlevels could be formed. We can also de-fine a variant of the pyramid where therequirements of block congruence at eachlevel and block similarity at different lev-els are relaxed. This is a bit easier if wedefine the pyramid in a top-down manneras we can calculate the number of cells bywhich the underlying space needs to be ex-panded as the block sizes at the differentlevels are defined. It should be clear thatthe congruence requirement is more re-strictive than the similarity requirement.If we relax the requirement that the blocksat different levels are similar, but retainthe requirement that the blocks at thesame level are congruent, then we muststore at each level i the size of the blockqi (i.e., the values of the individual compo-nents rij of qi = ∏d

j=1 rij for dimension j(1 ≤ j ≤ d )).

If we relax both the requirement thatthe blocks at the different levels are sim-ilar and the requirement that the blocksat each level are congruent while still re-quiring that they form a containment hi-erarchy, then we are in effect permittingpartitioning hyperplanes (i.e., lines in two

dimensions) at arbitrary positions. In thiscase, we get a more general pyramid if weuse a top-down definition as now we canhave a different partition at each level.In this case, we have an irregular grid ateach level, and thus we must store thepositions of the partitioning hyperplanes(i.e., lines in two dimensions) at each level.We call the result an irregular grid pyra-mid. If the irregular grid is implementedwith an array access structure, then theresult is called an irregular grid arraypyramid.

Other pyramid variants are also possi-ble. For example, the dynamically quan-tized pyramid (DQP) [O’Rourke and SloanJr. 1984; Sloan Jr. 1981] is a two-dimen-sional containment hierarchy where theblocks at the different levels are neithersimilar nor congruent. It differs from theirregular pyramid in that the there isa possibly different 2 × 2 grid partitionat each block at each level rather thanone grid partition at each level. Noticethe close similarity to a complete pointquadtree [Finkel and Bentley 1974]. TheDQP finds use in cluster detection as wellas multidimensional histogramming. Ofcourse, even more general variants arepossible. In particular, we could use anyone of the other recursive and nonrecur-sive decompositions described in Section 3at each block with the appropriate accessstructure.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 24: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

182 H. Samet

5. OBJECT-BASED HIERARCHICALINTERIOR-BASED REPRESENTATIONS(R-TREES)

Our goal here is to be able to take a loca-tion a as input and return the objects inwhich a is a member (the location query)when using a representation that storeswith each object the addresses of the cellsthat comprise it (i.e., an explicit represen-tation). The most natural hierarchy thatcan be imposed on the objects that wouldenable us to answer this query is one thataggregates every M objects (that are hope-fully in close spatial proximity, althoughthis is not a requirement) into larger ob-jects. This process is repeated recursivelyuntil there is just one aggregated objectleft. Since the objects may have differentsizes and shapes, it is not easy to computeand represent the aggregate object. More-over, it is similarly difficult to test eachone of them (and their aggregates) to de-termine if they contain a since each onemay require a different test by virtue ofthe different shapes. Thus, it is useful touse a common aggregate shape and point-inclusion test to prune the search.

The common aggregate shape and point-inclusion test that we use assumes theexistence of a minimum enclosing box(termed a bounding box) for each object.This bounding box is part of the data as-sociated with each object and aggregate ofobjects. In this case, we reformulate ourobject hierarchy to be in terms of bound-ing boxes. In particular, we aggregate thebounding boxes of every M objects intoa box (i.e., block) of minimum size thatcontains them. This process is repeatedrecursively until there is just one blockleft. The value associated with the bound-ing box b is its location (e.g., the coordi-nate values of its diagonally opposite cor-ners for two-dimensional data). It shouldbe clear that the bounding boxes serve asa filter to prune the search for an objectthat contains a.

In this section, we expand on hier-archies of objects which actually ag-gregate the bounding boxes of the ob-jects. Section 5.1 gives an overview ofobject hierarchies and introduces the

general concepts of an object pyramidand an object-tree pyramid, which pro-vides a tree access structure for theobject pyramid. Sections 5.2–5.4 presentseveral aggregation methods. In par-ticular, Section 5.2 discusses ordering-based aggregation methods. Section 5.3discusses extent-based aggregation tech-niques which result in the R-tree rep-resentation, while Section 5.4 describesthe R∗-tree which is the best of theextent-based aggregation methods. Next,Section 5.5 discusses methods of updat-ing or loading an object-tree pyramidwith a large number of objects at once,termed bulk insertion and bulk loading,respectively. Section 5.6 concludes thepresentation by reviewing some of theshortcomings of the object-tree pyramidand discussing some of the solutions thathave been proposed to overcome them. Fora comparative look at these different ag-gregation methods, see the VASCO JAVAapplets found at http://www.cs.umd.edu/~hjs/quadtree/index.html [Brabec andSamet 1998]. The VASCO system also in-cludes many other indexing techniques forpoints, lines, rectangles, and regions thatare based on space decomposition (see alsothe sp-GiST system [Aref and Ilyas 2001]).Other libraries that are based on object hi-erarchies include GiST [Hellerstein et al.1995] and XXL [van den Bercken et al.2001].

5.1. Overview

The nature of the aggregation (i.e., us-ing bounding boxes), the number of objectsthat are being aggregated at each step (aswell as whether it can be varied), and,most importantly, deciding which objectsto aggregate is quite arbitrary althoughan appropriate choice can make the searchprocess much more efficient. The decisionas to which objects to aggregate assumesthat we have a choice in the matter. Itcould be that the objects have to be aggre-gated in the order in which they are en-countered. This could lead to poor searchperformance when the objects are not en-countered in an order that correlates with

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 25: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 183

spatial proximity. Of course, this is not anissue as long as we just have ≤M objects.

It should be clear that the issue of choiceonly arises if we know the identities ofall the objects before starting the aggre-gation process (unless we are permittedto rebuild the hierarchy each time we en-counter a new object or delete an object),and if we are permitted to reorder them sothat objects in aggregate i need not neces-sarily have been encountered prior to theobjects in aggregate i + 1, and vice versa.This is not always the case (i.e., a dynamicversus a static database), although for themoment we do assume that we know theidentities of all of the objects before start-ing the aggregation, and that we may ag-gregate any object with any other object.Observe also that the bounding boxes inthe hierarchy are not necessarily disjoint.In fact, the objects may be configured inspace in such a way that no disjoint hi-erarchy is possible. By the same reason-ing, the objects themselves need not bedisjoint.

The process that we have just outlinedcan be described more formally as follows.Assume that there are N objects in thespace and let n be the smallest power of Msuch that M n ≥ N . Assume that all aggre-gates contain M elements with the excep-tion of the last one at each level which maycontain less than M as M n is not neces-sarily equal to N . The hierarchy of objectsconsists of the set D of sets {Di} (0 ≤ i ≤ n)where Dn corresponds to the set of bound-ing boxes of the individual objects, Dn−1corresponds to the result of the initial ag-gregation of the bounding boxes of M ob-jects into N/M aggregates of objects andconsists of N/M bounding boxes, and D0is a set containing just one element corre-sponding to the aggregations of all of theobjects and is a bounding box that enclosesall of the objects. We term the resultinghierarchy an object pyramid. Once again,we have a multiresolution representationas the original collection of objects is de-scribed at several levels of detail by virtueof the number of objects whose boundingboxes are grouped at each level. This is incontrast with the cell pyramid where thedifferent levels of detail are distinguished

by the sizes of the cells that comprise theelements at each level.

Searching an object pyramid consistingof sets Di (0 ≤ i ≤ n) for the object con-taining a particular location a (i.e., the lo-cation query) proceeds as follows. We startwith D0, which consists of just one bound-ing box b, and determine if a is inside b.If it is not, then we exit and the answeris negative. If it is, then we examine theM elements in D1 that are covered byb and repeat the test using their bound-ing boxes. Note that unlike the cell pyra-mid, at this point, a is not necessarily in-cluded in the M bounding boxes in D1 asthese M bounding boxes are not requiredto cover the entire space spanned by b. Inparticular, we exit if a is not covered byat least one of the bounding boxes at thislevel. This process is applied recursivelyto all elements of D j for 0 ≤ j ≤ n untilall elements of Dn have been processed atwhich time the process stops. The advan-tage of this method is that elements of D j(1 ≤ j ≤ n) are not examined unless a isguaranteed to be covered by at least one ofthe elements of D j−1.

The bounding boxes serve to distinguishbetween occupied and unoccupied space,thereby indicating whether the search forthe objects that contain a particular loca-tion (i.e., the location query) should pro-ceed further. At a first glance, it wouldappear that the object pyramid is ratherinefficient for responding to the locationquery as in the worst case all of the bound-ing boxes at all levels must be examined.However, the maximum number of bound-ing boxes in the object pyramid, and hencethe maximum number that will have to beinspected, is

∑nj=0 M j ≤ 2N .

Of course, we may also have to exam-ine the actual sets of locations associ-ated with each object when the boundingbox does not result in any of the objectsbeing pruned from further considerationsince the objects are not necessarily rect-angular in shape (i.e., boxes). Thus us-ing the hierarchy provided by the objectpyramid results in at most an additionalfactor of two in terms of the number ofbounding box tests while possibly savingmany more tests. Therefore, the maximum

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 26: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

184 H. Samet

Fig. 11. (a) Object-tree pyramid for a collection of rectangle objects with M = 3, and(b) the spatial extents of the objects and the bounding boxes of the nodes in (a) withbroken lines denoting the bounding boxes of the corresponding leaf nodes. Noticethat the leaf nodes in the index also store bounding boxes although this is shownonly for the nonleaf nodes.

amount of work to answer the locationquery with the hierarchy is of the sameorder of magnitude to that which wouldhave been needed had the hierarchy notbeen introduced.

As we can see, the way in which weintroduced the hierarchy to form the ob-ject pyramid did not necessarily enable usto make more efficient use of the explicitinterior-based representation to respondto the location query. The problem wasthat once we determined that location awas covered by one of the bounding boxes,say b, in D j (0 ≤ j ≤ n−1), we had no wayto access the bounding boxes comprisingb without examining all of the boundingboxes in D j+1. This is easy to rectify byimposing an access structure in the formof a tree T on the elements of the hierar-chy D. One possible implementation is atree of fanout M where the root T0 corre-sponds to the bounding box in D0. T0 hasM links to its M children {T1k} which cor-respond to the M bounding boxes in D1that comprise D0. The set of nodes {Tik} atdepth i correspond to the bounding boxesin Di (0 ≤ i ≤ n), while the set of leaf nodes{Tnk} correspond to Dn. In particular, nodet in the tree at depth j corresponds tobounding box b in D j (0 ≤ j ≤ n − 1), andt contains M pointers to its M children inTj+1 corresponding to the bounding boxes

in D j+1 that are contained in b. We use theterm object-tree pyramid to describe thisstructure.

Figure 11(a) is an example object-treepyramid for a simple collection of 9 rectan-gle objects with M = 3 (and thus n = 2).Figure 11(b) shows the spatial extents ofthe objects and the bounding boxes of thenodes in Figure 11(a), with broken linesdenoting the bounding boxes correspond-ing to the leaf nodes. Note that the object-tree pyramid is not unique. Its structuredepends heavily on the order in which theindividual objects and their correspondingbounding boxes are aggregated.

The object-tree pyramid that we havejust described still has a worst case wherewe may have to examine all of the bound-ing boxes in D j (1 ≤ j ≤ n) when exe-cuting the location query or its variants(e.g., a window query). This is the caseif query location a is contained in everybounding box in D j−1. Such a situation, al-though rare, can arise in practice becausea may be included in the bounding boxesof many objects (termed a false hit), as thebounding boxes are not disjoint, while a iscontained in a much smaller number of ob-jects. Equivalently, false hits are caused bythe fact that a spatial object may be spa-tially contained in full or in part in sev-eral bounding boxes or nodes while being

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 27: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 185

associated with just one node or boundingbox.

However, unlike the object pyramid, theobject-tree pyramid does guarantee thatonly the bounding boxes that contain awill be examined and no others. Thus wehave not improved on the worst-case of theobject pyramid in that we may still haveto examine 2N bounding boxes, althoughwe have reduced its likelihood. It is inter-esting to observe that the object pyramidand the object-tree pyramid are instancesof an explicit interior-based representa-tion since it is still the case that associ-ated with each object o is a set containingthe addresses of the cells that comprise it.Note also that the access structure facil-itates only the determination of the ob-ject associated with a particular cell andnot which cells are contiguous. Thus theobject-tree pyramid is not an instance ofan implicit interior-based representation.

The decision as to which objects toaggregate plays an important factor inthe efficiency of the object-tree pyramidin responding to the location query. Theefficiency of the object-tree pyramid forsearch operations depends on its abilitiesto distinguish between occupied space andunoccupied space, and to prevent a nodefrom being examined needlessly due to afalse overlap with other nodes.

The extent to which these efficienciesare realized is a direct result of how wellour aggregation policy is able to satisfythe following two goals. The first goal is tominimize the number of aggregated nodesthat must be visited by the search. Thisgoal is accomplished by minimizing thearea common to sibling aggregated nodes(termed overlap). The second goal is toreduce the likelihood that sibling aggre-gated nodes are visited by the search. Thisis accomplished by minimizing the totalarea spanned by the bounding boxes ofthe sibling aggregated nodes (termed cov-erage). A related goal to that of minimiz-ing the coverage is one of minimizing thearea in sibling aggregated nodes that isnot spanned by the bounding boxes of anyof the children of the sibling aggregatednodes (termed dead area). Dead area isusually decreased by minimizing coverage

Fig. 12. (a) Four bounding boxes and the aggrega-tions that would be induced, (b) by minimizing thetotal area (i.e., coverage) of the covering boundingboxes of the two nodes and (c) by minimizing thearea common (i.e., overlap) to the covering bound-ing boxes of the two nodes. The dead area for thetwo possible aggregations is shown shaded.

and thus minimizing dead area is often nottaken into account explicitly. Another wayof interpreting these goals is that they aredesigned to ensure that objects that arespatially close to each other are stored inthe same node. Of course, at times, thesegoals may be contradictory.

For example, consider the four boundingboxes in Figure 12(a). The first goal is sat-isfied by the aggregation in Figure 12(c),while the second goal is satisfied by the ag-gregation in Figure 12(b). The dead area isshown shaded in Figures 12(b) and 12(c).Note that the dead area in Figure 12(b) isconsiderably smaller than the dead areain Figure 12(c) on account of the smalleramount of coverage in the children inFigure 12(b). Also observe that the deadarea for the bounding box of one aggre-gated node is not part of the boundingboxes of children a sibling aggregated nodeas seen in Figure 12(b).

The aggregation techniques describedabove take the space (i.e., volume) occu-pied by (termed extent of ) the boundingboxes of the individual spatial objects intoaccount. They are described in Sections 5.3and 5.4. An alternative is to order theobjects prior to performing the aggrega-tion. However, in this case, the only choicethat we may possibly have with respectto the identities of the objects which areaggregated is when the number of objects(or bounding boxes) that are being aggre-gated at each step is permitted to vary.The most obvious order, although not par-ticularly interesting or useful, is one thatpreserves the order in which the objects

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 28: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

186 H. Samet

were initially encountered (i.e., objects inaggregate i have been encountered beforethose in aggregate i + 1). The more com-mon orders are based on proximity or onthe values of a small set of parameters de-scribing a common property that is hope-fully related to the proximity (and to alesser degree to the shape and extent) ofthe objects or their bounding boxes in oneor all of the dimensions of the space inwhich they lie [Kamel and Faloutsos 1993;Moitra 1993; Roussopoulos and Leifker1985]. Ordering-based aggregation tech-niques are discussed in Section 5.2.

5.2. Ordering-Based AggregationTechniques

The most frequently used ordering tech-nique is based on mapping the boundingboxes of the objects to a representativepoint in a lower, the same, or a higher-dimensional space and then applying oneof the space-ordering techniques describedin Section 2 and shown in Figure 2. We usethe term object number to refer to the re-sult of the application of space ordering.4Some possible representative points fortwo-dimensional rectangle objects includethe following (e.g., Samet [1990b]):

(1) The centroid.(2) The centroid and the horizontal and

vertical extents (i.e., the horizontaland vertical distances from the cen-troid to the relevant sides).

(3) The x and y coordinate values ofthe two diagonally opposite cornersof the rectangle (e.g., the upper-left andlower-right corners).

(4) The x and y coordinate values of thelower-right corner of the rectangle andits height and width.

4Interestingly, we will see that no matter which ofthe implementations of the object-tree pyramid is be-ing deployed, the ordering is used primarily to buildthe object-tree pyramid, although it is used for split-ting in some cases such as the Hilbert R-tree [Kameland Faloutsos 1994]. The actual positions of the ob-jects in the ordering (i.e., the object numbers) are notusually recorded in the object-tree pyramid which issomewhat surprising as this could be used to speedup operations such as point location, etc.

Fig. 13. A collection 22 rectangle objects wherethe numbers associated with the rectangles de-note the relative times at which they werecreated.

For example, consider the collection of22 rectangle objects given in Figure 13where the numbers associated with therectangles denote the relative times atwhich they were created. Figure 14 showsthe result of applying a Morton order(Figure 14(a)) and Peano–Hilbert order(Figure 14(b)) to the collection of rectangleobjects in Figure 13 using their centroidsas the representative points.

Once the N objects have been ordered,the hierarchy D is built in the order Dn,Dn−1, . . . , D1, D0 where n is the small-est power of M such that M n ≥ N .Dn consists of the set of original ob-jects and their bounding boxes. Thereare two ways of grouping the items toform the hierarchy D: one-dimensionaland multidimensional.

In the one-dimensional groupingmethod, Dn−1 is formed as follows: Thefirst M objects and their correspondingbounding boxes form the first aggre-gate, the second M objects and theircorresponding bounding boxes form thesecond aggregate, etc. Dn−2 is formed byapplying this aggregation process againto the set Dn−1 of N/M objects and theirbounding boxes. This process is continuedrecursively until we obtain the set D0 con-taining just one element correspondingto a bounding box that encloses all of

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 29: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 187

Fig. 14. The result of applying (a) a Morton order, and (b) a Peano–Hilbert order to thecollection of 22 rectangle objects in Figure 13 using their centroids as the representativepoints.

Fig. 15. The bounding boxes corresponding to the first level of aggregation for the (a) Hilbertpacked R-tree, (b) Morton packed R-tree, and (c) packed R-tree (using a Peano–Hilbert orderfor the initial ordering) for the collection of 22 rectangle objects in Figure 13 with M = 6.The numbers associated with the rectangle objects in (a) and (b) denote the positions of theircorresponding centroids in the order.

the objects. Note however, that whenthe process is continued recursively, theelements of the sets Di(0 ≤ i ≤ n − 1)are not necessarily ordered in the samemanner as the elements of Dn.

There are several implementations ofthe object-tree pyramid using the one-di-mensional grouping methods. For exam-ple, the Hilbert packed R-tree [Kamel andFaloutsos 1993] is an object-tree pyramidthat makes use of a Peano–Hilbert order.It is important to note that only the leafnodes of the Hilbert packed R-tree are or-dered using the Peano–Hilbert order. Thenodes at the remaining levels are ordered

according to the time at which they werecreated. For example, Figure 15(a) showsthe bounding boxes corresponding to thefirst level of aggregation for the Hilbertpacked R-tree for the collection of 22 rect-angle objects in Figure 13 with M = 6.Similarly, Figure 15(b) shows the sameresult were we to build the same struc-ture using a Morton order (i.e., a Mortonpacked R-tree), again with M = 6. No-tice that there is quite a bit of overlapamong the bounding boxes as the aggrega-tion does not take the extent of the bound-ing boxes into account when forming thestructure.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 30: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

188 H. Samet

A slightly different approach is em-ployed in the packed R-tree [Roussopoulosand Leifker 1985] which is another in-stance of an object-tree pyramid. Thepacked R-tree is based on ordering the ob-jects on the basis of some criterion suchas increasing value of the x coordinate orany of the space-ordering methods shownin Figure 2. Once this order has beenobtained, the leaf nodes in the packedR-tree are filled by examining the ob-jects in increasing order where each leafnode is filled with the first unprocessedobject and its M − 1 nearest neighborswhich have not yet been inserted in otherleaf nodes. Once an entire level of thepacked R-tree has been obtained, the al-gorithm is reapplied to add nodes at thenext level using the same nearest neighborcriterion, terminating when a level con-tains just one node. The only differencebetween the ordering that is applied atthe levels containing the nonleaf nodesfrom that used at the level of the leafnodes is that in the former case we areordering the bounding boxes while in thelatter case we are ordering the actualobjects.

Besides the difference in the way non-leaf nodes are formed, we point out thatthe packed R-tree construction processmakes use of a proximity criterion in thedomain of the actual data rather than thedomain of the representative points whichis the case of the Hilbert packed R-tree.This distinction is quite important as itmeans that the Hilbert packed R-tree con-struction process makes no attempt to re-duce or minimize coverage and overlapwhich, as we shall soon see, are the realcornerstones of the R-tree data structure[Guttman 1984]. Therefore, as we pointout below, this makes the Hilbert packedR-tree (and, to a lesser extent, the packedR-tree) much more like a B-tree that isconstructed by filling each node to ca-pacity. For example, Figure 15(c) showsthe bounding boxes corresponding to thefirst level of aggregation for the packedR-tree for the collection of 22 rectangleobjects in Figure 13. In this case, the ob-jects were initially ordered using a Peano–Hilbert order.

The STR method (denoting sort-tile-recurse) of Leutenegger et al. [1997] is anexample of the multidimensional groupingmethod. Our explanation assumes, with-out loss of generality, that the underly-ing space is two-dimensional although theextension of the method to higher dimen-sions is straightforward. Assuming a totalof N rectangles and a node capacity of Mrectangles per leaf node, Dn−1 is formedby constructing a tiling of the underlyingspace consisting of s vertical slabs whereeach slab contains s tiles. Each tile cor-responds to an object-tree pyramid leafnode which is filled to capacity. Note thatthe result of this process is that the un-derlying space is being tiled with rect-angular tiles thereby resembling a grid,but, most importantly, unlike a grid, thehorizontal edges of horizontally adjacenttiles (i.e., with a common vertical edge)do not form a straight line (i.e., are notconnected). Using this process means thatthe underlying space is tiled with approx-imately

√N/M ×√

N/M tiles and resultsin approximately N/M object-tree pyra-mid leaf nodes. The tiling process is ap-plied recursively to these N/M tiles toform Dn−2, Dn−3, . . . etc. until obtainingjust one node.

The STR method builds the object-treepyramid in a bottom-up manner. The ac-tual mechanics of the STR method are asfollows. Sort the rectangles on the basis ofone coordinate value of some easily iden-tified point that is associated with them,say the x coordinate value of their cen-troid. Aggregate the sorted rectangles into√

N/M groups of√

NM rectangles each ofwhich forms a vertical slab containing allrectangles whose centroid’s x coordinatevalue lies in the slab. Next, for each verti-cal slab v, sort all rectangles in v on the ba-sis of their centroid’s y coordinate value.Aggregate the

√NM sorted rectangles in

each slab v into√

N/M groups of M rect-angles each. Recall that the elements ofthese groups form the leaf nodes of theobject-tree pyramid. Notice that the mini-mum bounding boxes of the rectangles ineach tile are usually larger than the tiles.The process of forming a grid-like tilingis now applied recursively to the N/M

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 31: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 189

minimum bounding boxes of the tiles withN taking on the value of N/M until thenumber of tiles is no larger than M , inwhich case all of the tiles fit in the rootnode and we are done.

A couple of items are worthy of furthernote. First, the minimum bounding boxesof the rectangles in each tile are usuallylarger than the tiles. This means that thetiles at each level will overlap. Thus, we donot have a true grid in the sense that theelements at each level of the object-treepyramid are usually not disjoint. Second,the ordering that is applied is quite simi-lar to a row order (actually column orderto be precise) as illustrated in Figure 2(a)where the x coordinate value serves asa primary key to form the vertical slabswhile the y coordinate value serves as thesecondary key to form the tiles from thevertical slabs. Nevertheless, the orderingserves only to determine the partitioninglines to form the tiles but is not used toorganize the collection of tiles.

Notice that the STR method is a bottom-up technique. However, the same ideacould also be applied in a top-down man-ner so that we originally start with Mtiles which are then further partitioned.In other words, we start with

√M vertical

slabs containing√

M tiles apiece. This isinstead of the initial

√N/M vertical slabs

containing√

N/M tiles in the bottom-upmethod. The disadvantage of the top-downmethod is that it requires that we makeroughly 2 logM N passes over all of thedata whereas the bottom-up method hasthe advantage of making just two passesover the data (one for the x coordinatevalue and one for the y coordinate value)since all recursive invocations of the algo-rithm deal with centroids of the tiles.

The top-down method can be viewed asan ordering technique in the sense that theobjects are partitioned, thereby creating apartial ordering, according to their rela-tive position with respect to some criterionsuch as a value of a statistical measure forthe set of objects as a whole. For example,in the VAMSplit R-tree of White and Jain[1996a], which is applied to point data, thesplit axis (i.e., x or y or z, etc.) is chosen onthe basis of having the maximum variance

from the mean in the distribution of thepoint data. Once the axis is chosen, the ob-jects are split is into two equally-sized setsconstrained so that the resulting nodes areas full as possible. This process is appliedrecursively to the resulting sets.

The top-down method is also used byGarcıa et al. [1998a] with a different parti-tioning strategy. For each dimension, thisstrategy applies a user-defined function todecide on the quality or penalty incurredby the split (e.g., coverage, overlap, etc.).Assuming N objects and a bucket capac-ity M (which is also the fanout of packednonleaf nodes), the partitioning algorithmuses a heuristic that considers O(M ) splitpositions and selects the one among thosethat yields the minimum cost or penalty.

The algorithm proceeds as follows. Atthe initial step, it sorts the N objectsalong each dimension, and then groupsthe sorted objects into M groups of l =�N/M objects each. It then constructs theminimum bounding box of each group inO(N ) time. Next, it processes the bound-ing boxes of the groups in increasing or-der and forms a bounding box for the firsttwo groups, the first three groups, . . . , upto the first l − 1 groups. The algorithmconsiders many different orderings, andchooses the best split among all the dif-ferent orderings. The suggested orderingsare the min, max, and center of the bound-ing boxes for each dimension—that is, fortwo-dimensional data, the suggested algo-rithm may consider between two, four, orsix different orderings at each pass. Thesame process is applied to the boundingboxes in decreasing order. This can be donein O(M ) time. At this point, the algorithmfinds the optimal split position by con-sidering all possible O(M ) split positions,which can also be done in O(M ) time.This results in two buckets containing i · land N − i · l where i is between 1 andM −1. This process is then applied to eachbucket that contains more than l objects,which may also require that the objects besorted again. Once all buckets have ≤ lobjects, we have completed the first levelof the R-tree. Next, this process is appliedrecursively to the subtrees at the nextlevel.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 32: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

190 H. Samet

If at each step of the algorithm, thenodes resulting from the split contain ap-proximately the same number of boundingboxes, then the sorting component of thealgorithm performs a minimum number ofcomparisons as the maximum sizes of thebuckets are minimized. In order to analyzethe execution time of the algorithm, we as-sume a worst-case scenario for each split(which means that i = 1 or equivalentlyi = M − 1) at each stage for each level.It can be shown that in such a case thetotal execution time for sorting (which isthe dominant cost factor in the algorithm)is O(c · d · N · (log N )2 · M/ log M ) wherec is the number of possible orderings and dis the dimensionality of the data [Alborziand Samet, 2004]. It is important to notethat use of this method does not neces-sarily result in a minimum cost partitionsince it does not take into account all of thepossible groupings of the N objects, whichis exponential in N (i.e., O(2N )).

Regardless of how the objects are ag-gregated, the object-tree pyramid is anal-ogous to a height-balanced M -ary treewhere only the leaf nodes contain data (ob-jects in this case), and all of the leaf nodesare at the same level. Thus, the object-treepyramid is good for static data sets. How-ever, in a dynamic environment where ob-jects are added and deleted at will, theobject-tree pyramid needs to be rebuilt ei-ther entirely or partially to maintain thebalance, order, and node size constraints.In the case of binary trees, this issue is ad-dressed by making use of a B-tree, or a B+-tree if we wish to restrict the data (i.e., theobjects) to the leaf nodes as is the case inour application. Below, we show how to usethe B+-tree to make the object-tree pyra-mid dynamic.

When the aggregation in the object-treepyramid is based on ordering the objects,the objects and their bounding boxes canbe stored directly in the leaf nodes of theB+-tree. We term the result an object B+-tree. The key difference between the ob-ject B+-tree and the object-tree pyramidis that the B+-tree (and likewise the ob-ject B+-tree) permits the number of objectsand nodes that are aggregated at each stepto vary (i.e., the number of children per

node). This is captured by the order of theB+-tree, where for an order (m, M ) B+-tree, this number usually ranges betweenm ≥ �M/2 and M with the root having atleast 2 children unless it is a leaf node. Theonly modification to the B+-tree definitionis in the format of the nodes of the objectB+-tree. In particular, the format of eachnonleaf node p is changed so that if p hasj children, then p contains the following3 items of information for each child s:

(1) A pointer to s.(2) The maximum object number associ-

ated with any of the children of s (anal-ogous to a key in the conventional B+-tree).

(3) The bounding box b for s (e.g., the co-ordinate values of a pair of diagonallyopposite corners of b).

Notice that j bounding boxes are stored ineach node corresponding to the j childreninstead of just one bounding box as calledfor in the definition of the object-tree pyra-mid. This is done to speed up the point-inclusion tests necessary to decide whichchild to descend when executing the loca-tion query. In particular, it avoids a diskaccess when the nodes are stored on disk.

A leaf node p in the object B+-tree hasa similar format with the difference thatinstead of having pointers to j childrenwhich are nodes in the tree, p has jpointers to records corresponding to the jobjects that it represents. Therefore, pcontains the following three items of in-formation for each object s:

(1) A pointer to the actual object corre-sponding to s.

(2) The object number associated with s.(3) The bounding box b for s (e.g., the co-

ordinate values of a pair of diagonallyopposite corners of b).

Observe that unlike the object-treepyramid, the object B+-tree does store ob-ject numbers in both the leaf and non-leaf nodes in order to facilitate updates.The update algorithms (i.e., data struc-ture creation, insertion, and deletion) foran object B+-tree are identical to thosefor a B+-tree with the added requirement

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 33: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 191

Fig. 16. The bounding boxes corresponding to the first level of aggregation forthe (a) Hilbert R-tree, and (b) Morton R-tree for the collection of 22 rectangleobjects in Figure 13 with m = 3 and M = 6.

of maintaining the bounding box informa-tion, while the search algorithms (e.g., thelocation query, window queries, etc.) areidentical to those for an object-tree pyra-mid. The performance of the object B+-treefor answering range queries is enhancedif the initial tree is built by inserting theobjects in sorted order filling each nodeto capacity, subject to the minimum occu-pancy constraints, thereby resulting in atree with minimum depth. Of course, suchan initialization will cause subsequent in-sertions to be more costly as they willinevitably result in node split operationswhereas this would not necessarily be thecase if the nodes were not filled to capac-ity initially. The Hilbert R-tree [Kamel andFaloutsos, 1994] is an instance of an ob-ject B+-tree that applies a Peano–Hilbertspace ordering (Figure 2(d)) to the cen-troid of the bounding boxes of the ob-jects. The Hilbert R-tree is closely relatedto the Hilbert tree [Lea 1988] which ap-plies the same ordering to a set of pointsand then stores the result in a height-balanced binary tree (see also Tropf andHerzog [1981], which makes use of aMorton order and a 1–2 brother tree[Ottmann and Wood 1980]).

Figure 16(a) shows the bounding boxescorresponding to the first level of aggrega-tion for the Hilbert R-tree for the collectionof 22 rectangle objects in Figure 13 withm = 3 and M = 6 when the objects are

inserted in the order in which they werecreated (i.e., their corresponding numberin Figure 13. Similarly, Figure 16(b) showsthe corresponding result when using aMorton order instead of a Peano–Hilbertorder. Notice that for pedagogical reasons,the trees were not created by inserting theobjects in sorted order as suggested aboveas in this case the resulting trees wouldbe the same as the Hilbert packed R-treeand Morton packed R-tree in Figures 15(a)and 15(b), respectively.

Observe that use of ordering-based ag-gregation methods can lead to substan-tial overlap between the bounding boxesof the nodes. Rearranging the objects thatare aggregated in each node can allevi-ate this problem, but only to a very lim-ited extent as the order of the leaf nodesmust be maintained—that is, all elementsof leaf node i must have a Peano–Hilbert(Morton) order number that is less thanall elements of leaf node i + 1. Thus, allwe can do is change the number of el-ements that are aggregated in the nodesubject to the node capacity constraints.Of course, this means that the result-ing trees are not unique. For example, inFigure 16(a), we could aggregate objects13–17 into one nonleaf node and objects18–22 into another nonleaf node which re-sults in less overlap. However, the realshortcoming is that it could be the casethat objects 2 and 20 should be aggregated

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 34: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

192 H. Samet

Fig. 17. (a) R-tree for the same collection of rectangle objects given inFigure 11 with m = 2 and M = 3, and (b) the spatial extents of the objectsand the bounding boxes of the nodes in (a) with broken lines denoting thebounding boxes of the corresponding leaf nodes. Notice that the leaf nodesin the index also store bounding boxes, although this is shown only for thenonleaf nodes.

(actually object 2 with objects 18–22) butthis is impossible as their correspondingpositions in the Peano–Hilbert order are sofar apart. The problem is caused, in part,by the presence of objects with nonzero ex-tent and the fact that neither the extentof the objects nor their proximity is takeninto account in the ordering-based aggre-gation techniques (i.e., they do not try tominimize coverage and/or overlap whichare the cornerstones of the R-tree). Thisdeficiency was also noted earlier for theHilbert packed R-tree.

5.3. Extent-Based Aggregation Techniques

When the objects are to be aggregated onthe basis of their extent (i.e., the space oc-cupied by their bounding boxes), then gooddynamic behavior is achieved by makinguse of an R-tree [Guttman, 1984]. An R-tree is a generalization of the object-treepyramid where, for an order (m, M ) R-tree, the number of objects or boundingboxes that are aggregated in each node ispermitted to range between m ≤ �M/2and M while it is always M for the object-tree pyramid. The root node in an R-treehas at least two entries unless it is a leafnode, in which case it has just one entrycorresponding to the bounding box of anobject. The R-tree is usually built as theobjects are encountered rather than wait-ing until all objects have been input. Of

the different variations on the object-treepyramid that we discussed, the R-tree isthe one that is used most frequently, espe-cially in database applications.

Figure 17(a) is an example R-tree for thesame collection of nine rectangle objectsgiven in Figure 11 with m = 2 and M = 3.Figure 17(b) shows the spatial extents ofthe objects and the bounding boxes of thenodes in Figure 17 with broken lines de-noting the bounding boxes correspondingto the leaf nodes, and gray lines denot-ing the bounding boxes corresponding tothe subtrees rooted at the nonleaf nodes.Note that the R-tree is not unique. Itsstructure depends heavily on the order inwhich the individual objects were insertedinto (and possibly deleted from) the tree.

Given that each R-tree node can containa varying number of objects or boundingboxes, it is not surprising that the R-treewas inspired by the B-tree. This meansthat nodes are viewed as analogous to diskpages. Thus the parameters defining thetree (i.e., m and M ) are chosen so that asmall number of nodes is visited during aspatial query (i.e., variants of the locationquery), which means that m and M areusually quite large.

The need to minimize the number ofdisk accesses also effects the format ofeach R-tree node. Recall that in the def-inition of the object-tree pyramid, eachnode p contains M pointers to p’s children

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 35: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 193

and one bounding box corresponding to theunion of the bounding boxes of p’s chil-dren. This means that in order to decidewhich of node p’s children should be de-scended, we must access the nodes corre-sponding to these children to perform thepoint-inclusion test. Each such access re-quires a disk I/O operation. In order toavoid these disk I/O operations, the for-mat of R-tree node p is modified so thatp contains k (m ≤ k ≤ M ) pointers top’s children and the k bounding boxes ofp’s children instead of containing just onebounding box corresponding to the unionof the bounding boxes of p’s children asis the case for the object-tree pyramid.5Recall that this format is also used inthe definition of a node in the object B+-tree. Once again, we observe that the kpoint-inclusion tests do not require anydisk I/O operations at the cost of beingable to aggregate a smaller number of ob-jects in each node since m and M are nowsmaller assuming that the page size isfixed.

As long as the number of objects in eachR-tree leaf node is between m and M , noaction needs to be taken on the R-treestructure other than adjusting the bound-ing boxes when inserting or deleting anobject. If the number of objects in a leafnode decreases below m, then the nodeis said to underflow. In this case, the ob-jects in the underflowing nodes must bereinserted, and bounding boxes in nonleafnodes must be adjusted. If these nonleafnodes also underflow, then the objects intheir leaf nodes must also be reinserted.If the number of objects in a leaf node in-creases above M , then the node is said tooverflow. In this case, it must be split andthe M + 1 objects that it contains mustbe distributed in the two resulting nodes.Splits are propagated up the tree.

5The A-tree [Sakurai et al. 2000] is somewhat of acompromise in that it stores quantized approxima-tions of the k bounding boxes of p’s children wherethe locations of the bounding boxes of p’s children arespecified relative to the location of the bounding boxof p thereby enabling them to be encoded with just asmall number of bits. This idea was first proposed byHenrich [1998] and is also used in the hybrid tree ofChakrabarti and Mehrotra [1998, 1999].

Underflows in an R-tree are handled inan analogous manner to the way they aredealt with in a B-tree. In contrast, theoverflow situation points out a significantdifference between an R-tree and a B-tree.Recall that overflow is a result of attempt-ing to insert an item t in node p and deter-mining that node p is too full. In a B-tree,we usually don’t have a choice as to thenode p that is to contain t since the treeis ordered. Thus once we determine thatp is full, we must either split p or apply arotation (also known as deferred splitting)process. On the other hand, in an R-tree,we can insert t in any node p, as long as pis not full. However, once t is inserted in p,we must expand the bounding box associ-ated with p to include the space spannedby the bounding box b of t. Of course, wecan also insert t in a full node p, in whichcase we must also split p.

The need to expand the bounding box ofp has an effect on the future performanceof the R-tree, and thus we must make awise choice with respect to p. As in the caseof the object-tree pyramid, the efficiency ofthe R-tree for search operations dependson its abilities to distinguish between oc-cupied space and unoccupied space, and toprevent a node from being examined need-lessly due to a false overlap with othernodes. Again, as in the object-tree pyra-mid, the extent to which these efficienciesare realized is a direct result of how wellwe are able to satisfy our goals of mini-mizing coverage and overlap. These goalsguide the initial R-tree creation process aswell subject to the previously mentionedconstraint that the R-tree is usually builtas the objects are encountered rather thanwaiting until all objects have been input.

In the original definition of the R-tree [Guttman 1984] the goal of minimiz-ing coverage is the one that is followed.In particular, an object t is inserted by arecursive process that starts at the rootof the tree and chooses the child whosecorresponding bounding box needs to beexpanded by the smallest amount to in-clude t. As we will see, other researchersmake use of other criteria such asminimizing overlap with adjacent nodesand even perimeter (e.g., in the R∗-tree

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 36: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

194 H. Samet

[Beckmann et al. 1990] as described inSection 5.4). [Theodoridis and Sellis 1993,1994] try to minimize the value of an objec-tive function consisting of a linear combi-nation of coverage, overlap, and dead areawith equal weights. Garcıa et al. [1998a]also make use of a similar objective func-tion to build the entire R-tree in a top-down manner.

Not surprisingly, these same goals alsoguide the node-splitting process. In thissituation, one goal is to distribute the ob-jects among the nodes so that the likeli-hood that the two nodes will be visitedin subsequent searches will be reduced.This is accomplished by minimizing thetotal area spanned by the bounding boxesof the resulting nodes (equivalent to whatwe termed coverage). The second goal is toreduce the likelihood that both nodes areexamined in subsequent searches. Thisgoal is accomplished by minimizing thearea common to both nodes (equivalent towhat we termed overlap). Again, we ob-serve that, at times, these goals may becontradictory.

Several node-splitting policies havebeen proposed that take these goals intoaccount. They are differentiated on the ba-sis of their execution-time complexity andby the number of these goals that they at-tempt to meet. An easy way to see the dif-ferent complexities is to look at the follow-ing three algorithms [Guttman 1984], allof which are based on minimizing the cov-erage. The simplest is an exhaustive algo-rithm [Guttman 1984] that tries all pos-sibilities. In such a case, the number ofpossible partitions is 2M −1. This is unrea-sonable for most values of M (e.g., M = 50for a page size of 1024 bytes).

The exhaustive approach can be appliedto obtain an optimal node split accord-ing to an arbitrary cost function that cantake into account coverage, overlap, andother factors. Interestingly, although wepointed out earlier that there are O(2M )possible cases to be taken into account,the exhaustive algorithm can be imple-mented in such a way that it need not re-quire O(2M ) time. In particular, Beckeret al. [1992] present an implementationthat takes only O(M 3) time for two-dimen-

sional data and O(dM log M + d2M 2d−1)time for d -dimensional data.

Garcıa et al. [1998b] present an imple-mentation of the exhaustive approach thatuses the same insight as the implementa-tion of Becker et al. [1992], which is thatsome of the boundaries of the two result-ing minimum bounding boxes are sharedwith the minimum bounding box of theoverflowing node. This insight constrainsthe number of possible groupings of theM objects in the node that is being split.The algorithm is flexible in that it can usedifferent cost functions for evaluating theappropriateness of a particular node split.However, the cost function is restrictedto being “extent monotone” which meansthat the cost function increases monoton-ically as the extent of one of the sides ofthe two bounding rectangles is increased(this property is also used by Becker et al.[1992], although the property is statedsomewhat differently).

Although the implementations ofBecker et al. [1992] and Garcıa et al.[1998b] both find optimal node splits,the difference between them is that theformer has the added benefit of guaran-teeing that the node split satisfies somebalancing criteria, which is a requirementin most R-tree implementations. Therationale, as we recall, is that in thisway the nodes are not too full, whichwould cause them to overflow again. Forexample, in many R-tree implementationsthere is a requirement that the split besuch that each node receives exactly halfthe rectangles, or that each receives atleast 40% of the rectangles. Satisfying thebalancing criteria is more expensive, ascould be expected, and in two dimensions,the cost of the algorithm of Becker et al.[1992] is O(M 3) as opposed to O(M 2) forthe algorithm of Garcıa et al. [1998a].

Garcıa et al. [1998b] found that identify-ing optimal node splits yielded only mod-est improvements in query performancewhich led them to introduce another im-provement to the insertion process. Thisimprovement is based on trying to fit oneof the two groups resulting from a split ofnode e into one of e’s siblings instead ofcreating a new node for every split. In

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 37: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 195

particular, one of the groups is insertedinto the sibling s for which the cost in-crease, using some predefined cost func-tion, resulting from movement into s isminimized. Once a sibling s has beenchosen, we move the appropriate groupand reapply the node splitting algorithmif the movement caused s to overflow.This process is applied repeatedly as longas there is overflow while requiring thatwe choose among the siblings that havenot been modified by this process. If wefind that there is overflow in node i andthere is no unmodified sibling left, thena new node is created containing one ofthe new groups resulting from the split ofi. Even if each node overflows, this pro-cess is guaranteed to terminate as at eachstep there is one less sibling candidate formotion.

The process described above is some-what similar to what is termed forced rein-sertion in the R∗-tree (see Section 5.4) withthe difference that forced reinsertion re-sults in reinsertion of the individual en-tries (i.e., objects in the case of leaf nodesand minimum bounding boxes in the caseof nonleaf nodes) at the root instead of asa group into one of the siblings. This rein-sertion into siblings is also reminiscent ofrotation (i.e., “deferred splitting”) in con-ventional B-trees with the difference beingthat there is no order in the R-tree whichis why motion into all unmodified siblingshad to be considered. This strategy wasfound to increase the node utilization andthereby improve query performance (byas much as 120% in experiments [Garcıaet al. 1998b] compared to the Hilbert R-tree [Kamel and Faloutsos 1994]).

The remaining two node-splitting al-gorithms have a common control struc-ture that consists of two stages. The firststage “picks” a pair of bounding boxes jand k to serve as “seeds” for the two re-sulting nodes, while the second stage re-distributes the remaining bounding boxesinto the nodes corresponding to j and k.The redistribution process tries to min-imize the “growth” of the area spannedby j and k. Thus, the first and secondstages can be described as “seed-picking”and “seed-growing”, respectively.

The first of these “seed-picking” algo-rithms [Guttman 1984] is a quadraticcost algorithm that initially finds the twobounding boxes that would waste the mostarea were they to be in the same node.This is determined by subtracting the sumof the areas of the two bounding boxesfrom the area of the covering boundingbox. These two bounding boxes are placedin the separate nodes, say j and k. Next,the remaining bounding boxes are exam-ined, and for each bounding box, say i, dijand dik are computed, which correspondto the increases in the area of the coveringbounding boxes of nodes j and k, respec-tively, when i is added to them. Now, thebounding box r such that |drj − drk| is amaximum is found, and r is added to thenode with the smallest increase in area.This process is repeated for the remainingbounding boxes. The motivation for select-ing the maximum difference |drj−drk| is tofind the bounding box having the greatestpreference for a particular node j or k.

The second of of these “seed-picking” al-gorithms [Guttman 1984] is a linear costalgorithm that examines each dimensionand finds the two bounding boxes with thegreatest separation. Recalling that eachbounding box has a low and a high edgealong each axis, these two bounding boxesare the one whose high edge is the lowestalong the given axis and the one whose lowedge is the highest along the same axis.The separations are normalized by divid-ing the actual separation by the width ofthe bounding box of the overflowing nodealong the corresponding axis. The final“seeds” are the two bounding boxes havingthe greatest normalized separation amongthe d pairs that we found. The remainingbounding boxes are processed in arbitraryorder and placed in the node whose bound-ing box (i.e., of the entries added so far) isincreased the least in area as a result oftheir addition. Empirical tests [Guttman1984] showed that there was not much dif-ference between the three node-splittingalgorithms in the performance of a win-dow search query (i.e., in CPU time and inthe number of disk pages accessed). Thus,the faster linear cost node-splitting algo-rithm was found preferable for this query

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 38: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

196 H. Samet

Fig. 18. (a) Example collection of rectangles demonstrating a linear node splittingalgorithm that ensures the most even distribution of bounding boxes, and the twopossible splits caused by associating each bounding box with its closest face causedby a partition along the (b) x axis, and (c) y axis.

even though the quality of the splits wassomewhat inferior.

An alternative node splitting policy isbased on minimizing the overlap. Onetechnique which has a linear cost [Angand Tan 1997] applies d partitions (one foreach of the d dimensions) to the boundingboxes in the node t being split thereby re-sulting in 2d sets of bounding boxes. Inparticular, we have one set for each faceof the bounding box b of t. The partitionis based on associating each bounding boxo in t with the set corresponding to theclosest face along dimension i of b.6 Oncethe 2d partitions have been constructed(i.e., each bounding box o has been asso-ciated with d sets), select the partitionthat ensures the most even distribution ofbounding boxes. In case of a tie, choose thepartition with the least overlap. In caseof another tie, choose the partition withthe least coverage. For example, considerthe four bounding boxes in Figure 18(a).The partition along the x axis yields thesets {1,2} and {3,4} (Figure 18(b)) while thepartition along the y axis yields the sets{1,3} and {2,4} (Figure 18(c)). Since bothpartitions yield sets that are evenly dis-

6Formally, each bounding box o has two faces foil andfoih that are parallel to the respective faces fbil andfbih of b where l and h correspond to the low and highvalues of coordinate or dimension i. For each dimen-sion i, there are two sets Sil and Sih correspondingto faces fbil and fbih of b, and the algorithm insertso into Sil if xi( foil)−xi( fbil) < xi( fbih)−xi( foih) andinto Sih otherwise where xi( f ) is the ith coordinatevalue of face f .

tributed, we choose the one that minimizesoverlap (i.e., along the y axis).

The algorithm is linear as it examineseach bounding box once along each di-mension (actually, it is O(dM) for M ob-jects but d is usually much smaller thanM ). Experiments with randomly gener-ated rectangles [Ang and Tan 1997] re-sulted in lower coverage and overlap thanthe linear and quadratic algorithms de-scribed above [Guttman 1984] that arebased on minimizing the coverage. Thewindow search query was also found to beabout 16% faster with the linear algorithmbased on minimizing overlap than thequadratic algorithm based on minimizingcoverage. The drawback of this linear al-gorithm (i.e., Ang and Tan [1997]) is that itdoes not guarantee that the two nodes re-sulting from the partition will contain anequal number of bounding boxes. This isbecause the partitions are based on prox-imity to the borders of the bounding box ofthe node being split. In particular, whenthe data is not uniformly distributed, al-though the resulting nodes are likely tohave little overlap (as they are likely topartition the underlying space into twoequal areas), they will most likely containan uneven number of bounding boxes.

5.4. R*-tree

Better decompositions in terms of lessnode overlap and lower storage require-ments than those achieved by the linearand quadratic node-splitting algorithms

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 39: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 197

have also been reported in [Beckmannet al. 1990] where three significantchanges have been made to the R-tree con-struction algorithm including a differentnode-splitting strategy. An R-tree that isbuilt using these changes is termed anR*-tree [Beckmann et al. 1990].7 Thesechanges are described below. Interest-ingly, these changes also involve using anode splitting policy that, at times, triesto minimize both coverage and overlap.

The first change is the use of an intel-ligent object insertion procedure that isbased on minimizing overlap in the case ofleaf nodes, while minimizing the increasein area (i.e., coverage) in the case of non-leaf nodes. The distinction between leafand nonleaf nodes is necessary as the in-sertion algorithm starts at the root andmust process nonleaf nodes before encoun-tering the leaf node where the object willultimately be inserted. Thus, we see thatthe bounding box b for an object o is in-serted into the leaf node p for whom theresulting bounding box has the minimumincrease in the amount of overlap with thebounding boxes of p’s siblings (children ofnonleaf node s). This is in contrast to the R-tree where b is inserted into the leaf nodep for whom the increase in area is a mini-mum (i.e., based on minimizing coverage).This part of the R∗-tree object insertion al-gorithm is quadratic in the number of en-tries in each node (i.e., O(M 2) for an or-der (m, M ) R∗-tree where the number ofobjects or bounding boxes that are aggre-gated in each node is permitted to rangebetween m ≤ �M/2) as the overlap mustbe checked for each leaf node child p ofthe selected nonleaf node s with all of p’sO(M ) siblings.

The second change is that when a node pis found to overflow in an R∗-tree, insteadof immediately splitting p as is done inthe R-tree, first, an attempt is made to seeif some of the objects in p could possiblybe more suited to being in another node.This is achieved by reinserting a fraction

7The ‘*’ is used to signify its “star”-like performance[Seeger 1990] in comparison with R-trees built usingthe other node-splitting algorithms as can be seen inexamples such as Figures 19 and 20.

(30% has been found to yield good perfor-mance [Beckmann et al. 1990]) of theseobjects in the tree (termed forced reinser-tion). Forced reinsertion is similar in spiritto rotation (also known as “deferred split-ting”) in a conventional B-tree, which wasalso a technique developed to avoid split-ting a node.

There are several ways of determiningthe objects to be reinserted. One sugges-tion is to sort the bounding boxes in p ac-cording to the distance of the centers oftheir bounding boxes from the center ofthe bounding box of p, and to reinsert thedesignated fraction that are the farthest.Once we have determined the objects tobe reinserted, we need to choose an orderin which to reinsert them. There are twoobvious choices: from farthest to closest(termed far-reinsert) or from closest to far-thest (termed close-reinsert). Beckmannet al. [1990] make a case for using close-reinsert on the basis of results of experi-ments. One possible explanation is that ifthe reinsertion procedure places the firstobject to be reinserted in p, then the sizeof the bounding box of p is likely to beincreased more if “far-reinsert” was usedrather than “close-reinsert” thereby in-creasing the likelihood of the remainingobjects being reinserted in p as well. Thishas the effect of defeating the motivationfor the introduction of the reinsertion pro-cess which is to try to reorganize the nodes.However, it could also be argued that us-ing “far-reinsert” is more likely to resultin the farthest object being reinserted ina node other than p, on account of thesmaller amount of overlap, which is one ofthe goals of the reinsertion process. Thus,the question of which method to use is notcompletely settled.

The sorting step in forced reinsertiontakes O(M log M ) time. However, this costis greatly overshadowed by the fact thateach invocation of forced reinsertion canresult in the reinsertion of O(M ) objectsthereby increasing the cost of insertion bya factor of O(M ). One problem with forcedreinsertion is that it could lead to over-flow in the same node p again when allof the bounding boxes are reinserted inp, or even to overflow in another node q

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 40: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

198 H. Samet

at the same depth. This could lead to aninfinite loop. In order to prevent the oc-currences of such a situation, forced rein-sertion is applied only once at each depthfor a given object. Note also that forcedreinsertion is applied in a bottom-up man-ner in the sense that resolving overflow inthe leaf nodes may also lead to overflow ofthe nonleaf nodes, in which case we applyforced reinsertion to the nonleaf nodes aswell. When applying forced reinsertion toa nonleaf node p at depth l , we reinsertonly the elements in p and at depth l .

Forced reinsertion is quite important asusually an R-tree is built by inserting theobjects one by one as they are encoun-tered in the input. Thus, we don’t usuallyhave the luxury of processing the objectsin sorted order. This could lead to somebad decompositions in the sense that theredistribution stage may prefer one of the“seed” nodes over the other in a consistentmanner. Of course, this can be overcome bytaking into account the bounding boxes ofall of the objects before building the R-tree;but now the representation is no longer dy-namic. Forced reinsertion is a compromisein the sense that it permits us to periodi-cally rebuild part of the R-tree as a meansof compensating for some bad node place-ment decisions.

The third change involves the mannerin which an overflowing node p is split.Again, as in the original R-tree node-splitting algorithm, a two-stage process isused. The difference is in the nature ofthe stages. The process follows closely anapproach presented in an earlier study ofthe R-tree [Greene 1989] which did not re-sult in the coining of a new name for thedata structure! In particular, in contrastto the original R-tree node-splitting strat-egy [Guttman 1984] where the first stage“picks” two “seeds” for the two resultingnodes which are subsequently “grown” bythe second stage, in the R∗-tree (as wellas in the approach described in Greene[1989]), the first stage determines the axis(i.e., hyperplane) along which the split isto take place, while the second stage deter-mines the position of the split. In two di-mensions, for example, the split positioncalculated in the second stage serves as

the boundary separating the left (or anequivalent alternative is the right) sidesof the bounding boxes of the objects thatwill be in the left and right nodes result-ing from the split.

Note that the result of the calculationof the split position in the second stagehas the same effect as the redistributionstep in the linear and quadratic cost R-tree node-splitting algorithms as it indi-cates which bounding boxes are associatedwith which node. In particular, as we willsee below, the first stage makes use of theresult of sorting the faces of the bound-ing boxes along the various dimensions.Moreover, it would appear that the firstand last bounding boxes in the sort se-quence play a somewhat similar role tothat of the “seeds” in the original R-treenode-splitting algorithms. However, thiscomparison is false as there is no “grow-ing” process in the second stage. In par-ticular, these “seeds” do not “grow” in anindependent manner in the sense that thebounding boxes bi that will be assigned totheir groups are determined by the rela-tive positions of the corresponding faces ofbi (e.g., in two dimensions, the sorted orderof their left, right, top, or bottom sides).

This two-stage process is implementedby performing 2d sorts (two per axis) of thebounding boxes of the objects in the over-flowing node p. For each axis a, the bound-ing boxes are sorted according to their twoopposite faces that are perpendicular to a.The positions of the faces of the boundingboxes in the sorted lists serve as the candi-date split positions for the individual axes.There are several ways of using this in-formation to determine the split axis andsplit position along the axis.

Beckmann et al. [1990] choose the splitaxis as the axis a for which the averageperimeter of the bounding boxes of thetwo resulting nodes for all of the possiblesplits along a is the smallest while stillsatisfying the constraint posed by m andM . An alternative approach (although notnecessarily yielding the desired result asshown below) is one that chooses the splitaxis as the axis a for which the perime-ter of the two resulting nodes is a mini-mum. Basing the choice on the value of

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 41: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 199

the perimeter is related to the goal of min-imizing coverage by favoring splits that re-sult in nodes whose bounding boxes havea square-like shape. Basing the choice ofthe split axis on the minimum averageperimeter results in giving greater weightto the axis where the majority of the pos-sible splits result in nodes whose bound-ing boxes have square-like shapes. Thisstage takes O(d M log M ) time as the sorttakes O(M log M ) time for each axis whilethe average perimeter computation can bedone in O(M ) time for each axis whenscanning the faces of the bounding boxesin sorted order.

The position of the split along the axis aselected by the first stage is calculated byexamining the two sorted lists of possiblesplit positions (i.e., faces of the boundingboxes) for a and choosing the split positionfor which the amount of overlap betweenthe bounding boxes of the two resultingnodes is the smallest while still satisfyingthe constraint posed by m and M . Ties areresolved by choosing the position whichminimizes the total area of the resultingbounding boxes thereby reducing the cov-erage. Minimizing the overlap reduces thelikelihood that both nodes will be visitedin subsequent searches. Thus, we see thatthe R∗-tree’s node-splitting policy tries toaddress the issues of minimizing both cov-erage and overlap. Determining the splitposition requires O(M ) overlap computa-tions when scanning the bounding boxesin sorted order. Algorithms that employthis sort-and-scan paradigm are knownas plane-sweep techniques [Baird 1976;Preparata and Shamos 1985; Shamos andHoey 1976].

Figure 19 shows the bounding boxescorresponding to the first level of aggre-gation for an R∗-tree in comparison tothat resulting from the use of an R-treethat deploys the exhaustive (Figure 20(a)),linear cost (Figure 20(b)), quadraticcost (Figure 20(c)), and the linear costof Ang and Tan [1997] (Figure 20(d)) node-splitting algorithms for the collection of22 rectangles in Figure 14. It is quite clearfrom the figure, at least for this exam-ple data set, that the combined criterionused by the R∗-tree node-splitting algo-

rithm that chooses the split which min-imizes the sum of the perimeters of thebounding boxes of the two resulting nodes,as well as their overlap, seems to be work-ing. Whether this is indeed the change inthe definition that leads to this behavioris unknown.

Empirical studies have shown that useof the R∗-tree node-splitting algorithminstead of the conventional linear andquadratic cost R-tree node-splitting algo-rithms leads to a reduction in the space re-quirements (i.e., improved storage utiliza-tion) ranging from 10 to 20% [Beckmannet al. 1990; Hoel and Samet 1995] whilerequiring significantly more time to buildthe R∗-tree [Hoel and Samet 1995]. Theeffect of the R∗-tree node-splitting algo-rithms vis-a-vis the conventional linearand quadratic cost node-splitting algo-rithms on query execution time is not soclear due to the need to take factors suchas paging activity, node occupancy, etc.into account [Beckmann et al. 1990; Hoeland Samet 1995; Moitra 1993].

Although the definition of the R∗-treemakes three changes to the original R-tree definition [Guttman 1984], it can beargued that the main distinction, from aconceptual point of view rather than fromits effect on performance, is in the wayan overflowing node is split, and in theway the bounding boxes are redistributedin the two resulting nodes.8 In particu-lar, the original R-tree node splitting algo-rithms [Guttman 1984] determine “seeds”while the R∗-tree algorithm determinesa split axis and an axis split value. Thebounding boxes of the objects are redis-tributed about these “seeds” and axis, re-spectively. At this point, it is important tore-emphasize that the motivation for theseredistribution strategies is to avoid the ex-haustive search solution which looks at allpossible partitions.

The R∗-tree redistribution method firstsorts the boundaries of the bounding boxes

8On the other hand, it could also be argued thatforced reinsertion is the most important distinctionas it has the ability to undo the effect of some inser-tions that may have caused undesired increases inoverlap and coverage.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 42: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

200 H. Samet

Fig. 19. The bounding boxes correspond-ing to the first level of aggregation for anR∗-tree for the collection of 22 rectanglesin Figure 14.

along each of the axes and then uses thisinformation to find the split axis a (withrespect to the minimum average perime-ter of the bounding boxes of the resultingnodes) and split position (with respect tothe minimal overlap once the split axiswas chosen). This is a heuristic that at-tempts to approximate the solution to thed -dimensional problem (i.e., optimal par-titioning with minimal coverage or over-lap) with an approximation of the optimalone-dimensional solution along one of theaxes. Intuitively, the validity of this ap-proximation would appear to decrease asd (i.e., the dimensionality of the underly-ing space) increases since more and moresplits are eliminated from consideration.However, the opposite conclusion might betrue as it could be argued that althoughthe number of eliminated splits grows ex-ponentially with d , the majority of theeliminated splits are bad anyway. This isa problem for further study.

The remaining changes involving forcedreinsertion and intelligent object inser-tion could have also been used in theR-tree construction algorithms. In partic-ular, although the original R-tree defini-tion [Guttman 1984] opts for minimizingcoverage in determining the subtree inwhich an object is to be inserted, it doesleave it open as to whether minimizingcoverage or overlap is best. Similarly, us-

ing forced reinsertion does not change theR-tree definition. It can be applied regard-less of how a node is split and which policyis used to determine the node into whichthe object is to be inserted. The evalua-tion of the R∗-tree conducted in Beckmannet al. [1990] involves all three of thesechanges. An evaluation of R-trees con-structed using these remaining changes isalso of interest.

The node splitting rules that form thebasis of the R∗-tree have also been usedin conjunction with some of the methodsfor constructing instances of the object-tree pyramid such as the Hilbert packedR-tree. In particular, DeWitt et al. [1994]suggest that it is not a good idea to filleach leaf node of the Hilbert packed R-treeto capacity. Instead, they pack each leafnode i, say up to 75% of capacity, and thenfor each additional object x to be placedin i, they check if the bounding rectan-gle of i needs to be enlarged by too much(e.g., more than 20% in area [DeWitt et al.1994]) in order to contain x, in which casethey start packing another node. In addi-tion, whenever a node has been packed,the contents of a small number (e.g.,3 [DeWitt et al. 1994]) of the most re-cently created nodes are combined into alarge node which is then resplit using theR∗-tree splitting methods. Although ex-periments show that these modificationslead to a slower construction time thanthat for the conventional Hilbert packedR-tree, the query performance is often im-proved (e.g., up to 20% in the experiments[DeWitt et al. 1994]).

5.5. Bulk Insertion and Bulk Loading

Up to now, our discussion of building theobject-tree pyramid has been differenti-ated on the basis of whether it is done ina static or a dynamic environment. Thestatic methods exemplified by the variouspacking methods such as the packed R-tree, Hilbert packed R-tree, and the STRmethod were primarily motivated by adesire to build the structure as fast aspossible. This is in addition to the sec-ondary considerations of maximizing stor-age utilization and possibly faster query

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 43: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 201

Fig. 20. The bounding boxes corresponding to the first level of aggregationfor an R-tree built using different node splitting policies: (a) exhaustive, (b)linear, (c) quadratic), and (d) the linear algorithm of [Ang and Tan 1997] forthe collection of 22 rectangles in Figure 14.

performance as a result of a shallowerstructure since each node is filled to ca-pacity thereby compensating for the factthat these methods may result in morecoverage and overlap. The dynamic meth-ods exemplified by the R-tree and the R∗-tree were motivated equally by a desire toavoid rebuilding the structure as updatesoccur (primarily as objects are added and,to a lesser extent, deleted), and by a de-sire for faster query performance due to areduction of coverage and overlap.

At times, it is desired to update an exist-ing object-tree pyramid with a large num-ber of objects at once. Performing theseupdates one object at a time using the im-plementations of the dynamic methods de-scribed above can be expensive. The CPU

and I/O costs can be lowered by group-ing the input objects prior to the insertion.This technique is known as bulk insertion.It can also be used to build the object-treepyramid from scratch in which case it isalso known as bulk loading. In fact, wehave seen several such techniques alreadyin our presentation of the static methodswhich employ packing. The difference isthat although the bulk loading methodsthat we discuss below are based on group-ing the input objects prior to using one ofthe dynamic methods of constructing theobject-tree pyramid, the grouping does notinvolve sorting the input objects which isa cornerstone of the bulk loading methodsthat employ packing. We discuss the bulkinsertion methods first.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 44: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

202 H. Samet

A simple bulk insertion idea is to sort allof the m new objects to be inserted accord-ing to some order (e.g., Peano–Hilbert) andthen insert them into an existing object-tree pyramid in this order [Kamel et al.1996]. This approach is used in the cube-tree [Roussopolos et al. 1997], a packedR-tree like structure for data warehous-ing and OLAP (denoting online analyticprocessing [Chaudhuri and Dayal 1997])applications. The rationale for sorting thenew objects is to have each new object berelatively close to the previously insertedobject so that most of the time the nodeson the insertion path are likely to be thesame, which is even more likely to be thecase if some caching mechanism is em-ployed. Thus, the total number of I/O op-erations is reduced. This technique worksfine when the number of objects being in-serted is small relative to the total num-ber of objects. Also, it may be the bestchoice when the collection of new objectsis spread over a relatively large portion ofthe underlying space, as in such cases theuse of other methods (see below) may leadto excessive overlap (but see discussion ofGBI [Choubey et al. 1999] below). It can beused with any of the methods of buildingan object-tree pyramid.

Another related bulk insertion method,due to Kamel et al. [1996], first orders thenew objects being inserted according to thePeano–Hilbert order, and then aggregatesthem into leaf nodes where each node isfilled to a predetermined percentage of thecapacity (e.g., 70%) as if we are buildingjust the leaf nodes of a Hilbert packed R-tree for the new objects. These leaf nodesare inserted into an object-tree pyramid inthe order in which they were built.

The STLT (denoting Small-Tree-Large-Tree) method of Chen et al. [1998] canbe viewed as a generalized variant of themethod of Kamel et al. [1996] in that in-stead of inserting the leaf nodes of theobject-tree pyramid T of the new data(which has been built using any construc-tion algorithm) in the existing tree E,it just inserts the root of T so that theleaf nodes of T will be at the same depthas the leaf nodes of E. Although thismethod will lead to a faster insertion time

than dynamic insertion, it will result inpoorer query performance due to a sig-nificant overlap between the nodes in Tand E. In order to overcome this prob-lem Choubey et al. [1999] introduce a newmethod (termed Generalized Bulk Inser-tion (GBI)) that uses cluster analysis todivide the new data into clusters. Smallclusters (e.g., containing just one point)are inserted using a regular dynamic in-sertion method, whereas for larger clus-ters, a tree is built and inserted using theSTLT method. In other words, the STLTmethod is really a sub-component of theGBI method. This reduces the amount ofoverlap, which can be very high for theSTLT method.

The bulk loading methods that we de-scribe [Arge et al. 1999; van den Berckenet al. 1997] insert the individual objectsusing dynamic insertion methods. In par-ticular, as we pointed out above, the ob-jects are not preprocessed (e.g., via an ex-plicit sorting step or aggregation into adistinct object-tree pyramid) prior to in-sertion as is the case for the bulk inser-tion methods. In particular, the sorting isdeferred as much as possible although atthe end of the bulk loading process, thedata is ordered on the basis of the under-lying tree structure, and hence can be con-sidered to be sorted. These bulk loadingmethods are general in that they are notjust applicable to the R-tree; instead, theyare applicable to most balanced tree datastructures which resemble B-trees. Theyare based on the general concept of thebuffer tree [Arge 1996], wherein each in-ternal node of the buffer tree contains abuffer of records stored on disk.

The basic idea behind the methodsbased on the buffer tree is that insertionsinto each nonleaf node of the buffer treeare batched. In particular, insertions oc-cur into the buffer associated with the rootnode and slowly trickle down the tree asbuffers are emptied when they are full.The buffers enable the effective use ofavailable main memory, thereby resultingin large savings in I/O cost over the reg-ular dynamic insertion method (althoughthe CPU cost may be higher, in part, dueto the large fanout when using one of the

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 45: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 203

methods [van den Bercken et al. 1997]as we point out below). Nevertheless, itcould be the case that the actual execu-tion could be slower in comparison to anon bulk-loading method in the case thatmany overflowing buffers need to be trick-led down.

In the method proposed by van denBercken et al. [1997], the R-tree is builtrecursively bottom-up. At each stage, anintermediate tree structure is built wherethe lowest level corresponds to the nextlevel of the final R-tree. The nonleaf nodesin the intermediate tree structures havea high fanout (determined by availableinternal memory) as well as a bufferthat receives insertions. Arge et al. [1999]achieve a similar effect by using a regu-lar R-tree structure (i.e., where the non-leaf nodes have the same fanout as theleaf nodes which is the size of a disk page)and only attaching buffers to nodes at cer-tain levels of the tree. The advantages ofthe method of Arge et al. [1999] over themethod of van den Bercken et al. [1997]are that it is more efficient as it does notbuild intermediate structures, and it re-sults in a better space partition. More-over, the method of Arge et al. [1999]yields the same R-tree as would have beenobtained using conventional dynamic in-sertion methods without buffering (withthe exception of the R∗-tree where the useof forced re-insertion is difficult to incor-porate in the buffering approach), whilethis is not the case for the method ofvan den Bercken et al. [1997]. In addition,the method of Arge et al. [1999] supportsbulk-insertions (as opposed to just initialbulk-loading as in van den Bercken et al.[1997]) and other bulk-queries includingintermixed insertions and queries.

5.6. Shortcomings and Solutions

In this section, we point out some of theshortcomings of the object-tree pyramid aswell as point out some of the solutions. Aswe are dealing with the representationsof objects, which are inherently of low di-mension, we do not discuss the shortcom-ings and solutions for high-dimensionaldata (e.g., the X-tree [Berchtold et al. 1996]

which attempts to address the problemarising when there is much overlap amongthe nodes corresponding to the partitionsthat result from a node split). One of thedrawbacks of the object-tree pyramid (i.e.,the R-tree as well as its variants suchas the R∗-tree) is that as the node size(i.e., page size—that is, M ) gets large,the performance starts to degrade. Thisis somewhat surprising as according toconventional wisdom, performance shouldincrease with node size as the depth ofthe tree decreases thereby requiring fewerfewer disk accesses. The problem is that asthe node size increases, operations on eachnode take more CPU time. This is espe-cially true if the operation involves search(e.g., finding the nearest object to a point)as the bounding boxes in each node are notordered [Hoel and Samet 1992].

This problem can be overcome by order-ing the bounding boxes in each node us-ing the same ordering-based aggregationtechniques that were used to make theobject-tree pyramid more efficient in re-sponding to the location query. For exam-ple, we could order the bounding boxes byapplying a Morton or Peano–Hilbert spaceordering to their centroids. We term the re-sult an ordered R-tree. Interestingly, theordered R-tree can be viewed as a hybridbetween an object B+-tree and an R-treein the sense that nodes are ordered inter-nally (i.e., their constituent bounding boxelements) using the ordering of the objectB+-tree, while they are ordered externally(i.e., vis-a-vis each other) using an R-tree.

Although the R-tree is height-balanced,the branching factor of each node is notthe same. Recall that each node containsbetween m and M objects or boundingboxes. This has several drawbacks. First,it means that the nodes are not fully occu-pied thereby causing the tree structure tobe deeper than it would be had the nodesbeen completely full. Therefore, the num-ber of data objects in the leaf nodes ofthe descendants of sibling nonleaf nodesis not the same and, in fact, can vary quitegreatly, thereby leading in imbalance interms of the number of objects stored indifferent subtrees. This could have a detri-mental effect on the efficiency of retrieval.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 46: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

204 H. Samet

Second, satisfying the branching factorcondition often requires compromising thegoal of minimizing total coverage, over-lap, and perimeter. The packed R-tree andthe Hilbert packed R-tree are some waysto overcome this problem as they initiallyhave a branching factor of M at all but thelast node at each level. However, they arenot necessarily designed to meet our goalsof minimizing total coverage, overlap, andperimeter.

The S-tree [Aggarwal et al. 1997] isan approach to overcome the above draw-backs of the R-tree and its packed variantsby trading off the height-balanced prop-erty in return for reduced coverage, over-lap, and perimeter in the resulting mini-mum bounding boxes. The S-tree has theproperty that each node that is not a leafnode or a penultimate node (i.e., a nodewhose children are all leaf nodes) has Mchildren. In addition, for any pair of sib-ling nodes (i.e., with the same parent) s1and s2 with Ns1 and Ns2 objects in theirdescendants, respectively, we have thatp ≤ Ns1/Ns2 ≤ 1/p (0 < p ≤ 0.5), wherep, termed the skew factor, is a parame-ter that is related to the skewness of thedata and governs the amount of tradeoffthereby providing a worst-case guaranteeon the skewness of the descendants of thenode. In particular, the number of objectsin the descendants of each of a pair of sib-ling nodes is at least a fraction p of the to-tal number of objects in the descendants ofboth nodes. This guarantee is fairly tightwhen p is close to 0.5, while it is quite loosewhen p is small. In other words, whenp = 0.5, the difference in the number ofobjects that will be found in the subtreesof a pair of sibling nodes is within a factorof 2, whereas this ratio can get arbitrarilylarge in a conventional R-tree.

The cost-based unbalanced R-tree(CUR-tree) of Ross et al. [2001] is anothervariant of an R-tree where the height-balanced requirement is relaxed in orderto improve the performance of point andwindow queries in an environment whereall the data is in main memory. The CUR-tree makes use of a cost model for the datastructure (i.e., the R-tree) that accountsfor operations such as reading the node

and making the comparisons needed tocontinue the search. In particular, uponevery insertion and deletion (rather thanjust upon overflow), every node on theinsertion path is examined to determineif its entries should be rearranged tolower the cost function. The result is thatnodes can be split, and their entries canbe promoted or demoted, at the cost of aslower update time.

Garcia et al. [1999] propose to improvethe query performance of R-trees by re-structuring the tree. Such restructuringcan be performed after an R-tree has beenbuilt or dynamically as insertions takeplace. The key idea is to select a nodee to be restructured and then apply therestructuring process to e and its ances-tors by merging and resplitting siblingnodes. In the dynamic case, the restruc-turing is applied with some fixed proba-bility for each insertion thereby ensuringthat the restructuring does not happen ateach insertion. Although at a first glance,restructuring seems similar to forced rein-sertion in the case of an R∗-tree (seeSection 5.4), they are quite different uponcloser scrutiny. In particular, forced rein-sertion takes individual node entries andreinserts them at the root. In contrast, re-structuring operates on groups of node en-tries by repeatedly merging and resplit-ting them, as necessary, in order to obtainbetter query performance through greaterstorage utilization and less overlap amongsibling nodes.

A shortcoming of all of the representa-tions that are based on object hierarchies(i.e., including all of the R-tree variants)is that when the objects are not hyper-rectangles, use of the bounding box ap-proximation of the object eliminates onlysome objects from consideration when re-sponding to queries. In other words, theactual execution of many queries requiresknowledge of the exact representation ofthe object (e.g., the location query). Infact, the execution of the query may bequite complex using this exact represen-tation. At times, these queries may beexecuted more efficiently by decomposingthe object further into smaller pieces suchas triangles, trapezoids, convex polygons,

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 47: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 205

etc. (e.g., Brinkhoff et al. [1994] andKriegel et al. [1991]). For example, theTR∗-tree [Brinkhoff et al. 1994; Schneiderand Kriegel 1991] is such a representationwhere each object in an R∗-tree is decom-posed into a collection of trapezoids. TheDR-tree [Lee and Chung 2001] is a relatedapproach where the minimum boundingbox is recursively decomposed into mini-mum bounding boxes until the volume ofeach box is less than predefined fractionof the volume of the initial bounding box.The result of the decomposition processis represented as a binary tree which isstored separately from the hierarchy thatcontains the minimum bounding boxes ofthe objects and can be processed in mem-ory once it has been loaded.

6. DISJOINT OBJECT-BASEDHIERARCHICAL INTERIOR-BASEDREPRESENTATIONS (K-D-B-TREES,R++-TREES, AND CELL TREES)

In our descriptions of the object pyramidand the object-tree pyramid in Section 5,we observed that we may have to exam-ine all of the bounding boxes at all levelswhen attempting to determine the iden-tity of the object o that contains location a(i.e., the location query). This was causedby the fact that the bounding boxes cor-responding to different nodes may over-lap. The fact that each object is associatedwith only one node while being containedin possibly many bounding boxes (e.g., inFigure 17, rectangle 1 is contained in itsentirety in R1, R2, R3, and R5) means thatthe location query may often require sev-eral nonleaf nodes to be visited before de-termining the object that contains a. Thisproblem also arises in the R-tree as seenin the following example.

Suppose that we wish to determine theidentity of the rectangle object(s), in thecollection of rectangles given in Figure 17that contains point Q at coordinate values(22, 24). We first determine that Q is in R0.Next, we find that Q can be in both or eitherof R1 or R2, and thus we must search bothof their subtrees. Searching R1 first, wefind that Q could be contained only in R3.Searching R3 does not lead to the rectangle

that contains Q even though Q is in a por-tion of rectangle D that is in R3. Thus, wemust search R2 and we find that Q can becontained only in R5. Searching R5 resultsin locating D, the desired rectangle. Thedrawback of the R-tree as well as otherrepresentations that make use of an objectpyramid is that unlike those based on thecell pyramid, they do not result in a dis-joint decomposition of space. Recall thatthe problem is that an object is associatedwith only one bounding bounding box (e.g.,rectangle D in Figure 17 is associated withbounding box R5, yet it overlaps boundingboxes R1, R2, R3, and R5). In the worst case,this means that when we wish to respondto the location query (e.g., given a point,determining the containing rectangle in arectangle database, or an intersecting linein a line segment database, etc. in the two-dimensional space from which the objectsare drawn), we may have to search the en-tire database. Thus, what we need is a hi-erarchy of disjoint bounding boxes.

An obvious way to overcome this draw-back is to use one of the hierarchicalimage-based representations described inSection 4. Recall that these representa-tions made use of a hierarchy of disjointcells that completely spanned the underly-ing space. The hierarchy consists of a set ofsets {Cj } (0 ≤ j ≤ n) where Cn correspondsto the original collection of cells, and C0corresponds to one cell. The sets differedin the number and size of the constituentcells at the different depths, although eachset was usually a containment hierarchyin the sense that a cell at depth i usuallycontained all of the cells below it at depthi + 1. The irregular grid pyramid is an ex-ample of such a hierarchy.

A simple way to adapt the irregular gridpyramid to our problem is to overlay thedecomposition induced by Cn−1 (i.e., thenext to the deepest level) on the boundingboxes {bi} of the objects {oi} thereby decom-posing the bounding boxes and associateeach part of the bounding box with thecorresponding covering cell of the irregu-lar grid pyramid. Note that we use the setat the next to the deepest level (i.e., Cn−1)rather than the set at the deepest level(i.e., Cn) as the deepest level contains the

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 48: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

206 H. Samet

original collection of unit-sized cells cnkand thus does not correspond to any aggre-gation. The cells cjk at the remaining levelsj (0 ≤ j ≤ n − 2) are formed in the sameway as in the irregular grid pyramid—thatis, they contain the union of the objects cor-responding to the portions of the boundingboxes associated with the cells comprisingcell cjk. Using our terminology, we termthe result an irregular grid bounding-boxpyramid. It should be clear that the depthof the irregular grid bounding-box pyra-mid is one less than that of the correspond-ing irregular grid pyramid.

The definition of the irregular gridpyramid as well as the other hierar-chical image-based representations stipu-lates that each unit-sized cell is containedin its entirety in one or more objects.Equivalently, a cell cannot be partially inobject o1 and partially in object o2. Thesame restriction also holds for block de-compositions which are not hierarchical(see Section 3). In contrast, in the case ofthe irregular grid bounding-box pyramid,the fact that the bounding boxes are justapproximations of the objects enables usto relax this restriction in the sense thatwe allow a cell (or a block in the case ofthe block decompositions of Section 3) tocontain parts of the bounding boxes of sev-eral objects. In other words, cell (or block)b can be partially occupied by part of thebounding box b1 of object o1, by part of thebounding box b2 of object o2, and may evenbe partially empty.

The irregular grid bounding-box pyra-mid is a hierarchy of grids, albeit thatthe grid sizes are permitted to vary inan arbitrary manner between levels. Thisdefinition is still overly restrictive in thesense that we want to be able to aggregatea varying, but bounded, number of cellsat each level (in contrast to a predefinednumber) that depends on the number ofbounding boxes or objects that are asso-ciated with them so that we can have aheight-balanced dynamic structure in thespirit of the B-tree. We also wish to usea hierarchy that makes use of a differentblock decomposition rule (e.g., a k-d tree,generalized k-d tree, point quadtree, bin-tree, region quadtree, etc. thereby form-

ing a variant of the cell-tree pyramid) in-stead of a grid as in the case of an irregularbounding-box pyramid.

Our solution is equivalent to a marriageof the bounding-box cell-tree pyramid hi-erarchy with one of the block decomposi-tions described in Section 3. This is doneby choosing a value M for the maximumnumber of cells (actually blocks) that canbe aggregated and a block decompositionrule (e.g., a generalized k-d tree). As weare propagating the identities of the ob-jects associated with the bounding boxesup the hierarchy rather than the space oc-cupied by them, we use an object-basedvariant of the block decomposition rule.This means that a block is decomposedwhenever it contains the bounding boxesof more than M objects rather than onthe basis of the absence of homogeneity.Note that the occupied space is implicitto the block decomposition rule and thusneed not be explicitly propagated up thehierarchy.

It should be clear that each object’sbounding box can appear only once in eachblock as the objects are continuous. If morethan M of the bounding boxes overlapeach other in block b (i.e., they all haveat least one point in common), then thereis no point in attempting to decompose bfurther as we will never be able to findsubblocks bi of b so that each of bi doesnot have at least one point in commonwith the overlapping bounding boxes. Ob-serve also that although the block decom-positions yield a partition of space intodisjoint blocks, the bounding boxes at thelowest level of the hierarchy may not nec-essarily be disjoint. For example, considera database of line segment objects andthe situation of a vertex where five of theline segments meet. It is impossible for thebounding boxes of the line segments to bedisjoint.

The object-based variants of the blockdecomposition rules are quite differentfrom their image-based counterparts thatwere discussed in Section 3 which basedthe decomposition on whether the spacespanned by the block was completely cov-ered by an object. It is important to re-iterate that the blocks corresponding to

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 49: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 207

Fig. 21. (a) k-d-B-tree for the same collection of rectangle objects given in Figure 11with m = 2 and M = 3, and (b) the spatial extents of the nodes. Notice that only in the leafnodes are the bounding boxes minimal.

the leaf nodes do not represent hyper-rectangular aggregates of identically val-ued unit-sized cells as in the conventionalpyramid. Instead, they represent hyper-rectangular aggregates of bounding boxesof objects or pieces thereof.

Without loss of generality, assuming ageneralized k-d tree block decompositionrule, the hierarchy of sets {Hj } (1 ≤ j ≤ n)is defined as follows. H0 consists of oneblock. H1 consists of a subset of the nodesof a generalized k-d tree decompositionZ of the underlying space so that Z hasa maximum of M elements whose corre-sponding blocks span the entire underly-ing space. H2 is formed by removing fromZ all nodes corresponding to members ofH1 and their ancestors, and then apply-ing the same rule that was used to formH1 to each of the blocks in H1 with re-spect to Z . In other words, H2 consists ofgeneralized k-d tree decompositions of theblocks h1k (1 ≤ k ≤ M ) that comprise H1.Each element of H2 contains no more thanM blocks for a a maximum of M 2 blocks.This process is repeated at each succes-sive level down to the leaf level at depthn − 1. The nodes at the leaf level containthe bounding boxes of the objects or partsof the bounding boxes of the objects. Thepyramid means that the hierarchy mustbe height-balanced with all leaf nodes atthe same level, and that the cells at depthj are disjoint and that they span the spacecovered by the cells at the immediatelylower level at depth j + 1.

We term the resulting data structure ageneralized k-d tree bounding-box cell-treepyramid on account of the use of the gener-alized k-d tree as the building block of thepyramid and the use of a tree access struc-ture, although it is more commonly knownas a k-d-B-tree [Robinson 1981] on accountof the similarity of the node structure tothat of a B-tree. If we would have used thepoint quadtree or the bintree as the build-ing blocks of the hierarchy, then we wouldhave termed the result a point quadtreebounding-box cell-tree pyramid or a bin-tree bounding-box cell-tree pyramid, re-spectively. It is interesting to note thatthe k-d-B-tree was originally developed forstoring point-like objects although the ex-tension to objects with extent is relativelystraightforward as shown here.

Figure 21 is an example of one possiblek-d-B-tree for the collection of 9 rectangleobjects given in Figure 11. Broken lines de-note the leaf nodes, and thin lines denotethe space spanned by the subtrees rootedat the nonleaf nodes. Of course, other vari-ations are possible since the k-d-tree isnot unique. This particular tree is of order(2,3) (i.e., having a minimum and maxi-mum of 2 and 3 entries, respectively) al-though in general it is not possible to al-ways guarantee that all nodes will have aminimum of 2 entries, nor is the minimuma part of the definition of the k-d-B-tree.Notice that rectangle object D appears inthree different nodes, while rectangle ob-jects A, B, E, and G appear in two different

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 50: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

208 H. Samet

nodes. Observe also that the example usesa partition scheme that cycles through theaxes in the order x, y , x, y , etc. although,as we shall see below, this cycling is notguaranteed to hold once objects are in-serted and deleted.

Our definition of the structure was givenin a top-down manner. In fact, the struc-ture is often built in a bottom-up mannerby inserting the objects one-at-a-time. Ini-tially, the hierarchy contains just one nodecorresponding to the bounding box of thesingle object. As each additional object ois processed, we insert o’s bounding box binto all of the leaf nodes which overlap it.If any of these nodes become too full, thenwe split these nodes using an appropriateblock decomposition rule and determine ifthe parent is not too full so that it can sup-port the addition of a child. If not, then werecursively apply the same decompositionrule to the parent. The process stops atthe root in which case overflow will usu-ally cause the hierarchy to grow by onelevel.

Variants of the bounding-box cell-treepyramid such as the k-d-B-tree are goodfor answering both the feature and loca-tion queries. However, in the case of thelocation query, they act only as a partitionof space. They do not distinguish betweenoccupied and unoccupied space. Thus, inorder to determine if a particular loca-tion a is occupied by one of the objectsassociated with cell c, we need to checkeach of the objects associated with c, whichcan be time consuming, especially if M islarge. We can speed this process by modify-ing the general definition of the bounding-box cell-tree pyramid so that a boundingbox is stored in each node r in the hier-archy, regardless of r ’s depth, that coversthe bounding boxes of the cells that com-prise r. Thus associated with each node r isthe union of the objects associated with thecells comprising r as well as a boundingbox of the union of their bounding boxes.We term the result a disjoint object pyra-mid. Recall that the depth of any vari-ant of the bounding-box pyramid is oneless than that of the corresponding con-ventional pyramid, and the same is truefor the disjoint object pyramid.

A key difference between the disjointobject pyramid and variants of the con-ventional pyramid, and to a lesser extentthe bounding-box pyramid, is that the el-ements of the hierarchy of the disjoint ob-ject pyramid are also parts of the bound-ing boxes of the objects rather than justthe cells that make up the objects which isthe case for both variants of the bounding-box and conventional pyramids. The rep-resentation of the disjoint object pyramid,as well as variants of the bounding-boxpyramid such as the k-d-B-tree, is alsomuch simpler as they both just decom-pose the objects until a a criterion in-volving the number of objects that arepresent in the block is satisfied ratherthan one based on the homogeneity ofthe block. This results in avoiding someof the deeper levels of the hierarchy thatare needed in variants of the conventionalpyramid.

There are many variants of the disjointobject pyramid. They differ according towhich of the block decomposition rules de-scribed in Section 3 is used. They are usu-ally referred to by the general term R+-tree [Faloutsos et al. 1987; Sellis et al.1987; Stonebraker et al. 1986] on accountof the similarity to the R-tree since theyboth store a hierarchy of bounding boxes.However, the block decomposition rule isusually left unspecified although a gener-alized k-d tree block decomposition rule isoften suggested. An alternative is not touse any decomposition rule in which caseeach node is just a collection of blocks asin Figure 4.

R+-trees are built in the same incremen-tal manner as any of the bounding-boxcell-tree pyramids that we described (e.g.,the k-d-B-tree, etc.). Again, as each addi-tional object o is processed, we insert o’sbounding box b into all of the leaf nodeswhich overlap it. If any of these nodes be-come too full, then we split these nodesusing the appropriate block decompositionrule and determine if the parent is not toofull so that it can support the addition ofa child. If not, then we recursively applythe same decomposition rule to the parent.The process stops at the root in which casethe R+-tree may grow by one level. The

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 51: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 209

Fig. 22. (a) R+-tree for the same collection of rectangle objects given in Figure 11 withm = 2 and M = 3, and (b) the spatial extents of the bounding boxes. Notice that the leafnodes in the index also store bounding boxes although this is shown only for the nonleafnodes.

difference from the method used in thebounding-box cell-tree pyramids is thatwe also propagate the minimum boundingbox information up the hierarchy. The en-tire process is analogous to that used ina B-tree upon overflow. The difference isthat at times, as is also the case for thek-d-B-tree, the decomposition at a nonleafnode may result in the introduction of anew partition that may force the reparti-tioning of nodes at deeper levels in the R+-tree.

Figure 22 is an example of one possibleR+-tree for the same collection of nine rect-angles given in Figure 11. Broken linesdenote the bounding boxes correspondingto the leaf nodes, and thin lines denotethe bounding boxes corresponding to thesubtrees rooted at the nonleaf nodes. Inthis case, we simply took the k-d-B-treeof Figure 21 and added bounding boxesto the nonleaf nodes. This particular treeis of order (2,3) although in general it isnot possible to always guarantee that allnodes will have a minimum of two entries.Notice that rectangle D appears in threedifferent nodes, while rectangles A. B, E,and G appear in three different nodes. Ofcourse, other variants are possible sincethe R+-tree is not unique.

The cell tree of Gunther [1987] andGunther and Bilmes [1991] is similar tothe R+-tree. The difference is that the non-leaf nodes of the cell tree are convex poly-hedra instead of bounding rectangles. Thechildren of each node, say P , form a binary

space partition (BSP) [Fuchs et al. 1980]of P . The cell tree is designed to deal withpolyhedral data of arbitrary dimension. Asin the R+-tree, the polyhedral data that isbeing represented may be stored in morethan one node. When the decompositioncauses an object to be split too many times,Gunther and Noltemeier [1991] store theobject in what they term oversized shelveswhich are associated with nonleaf nodesin the structure. This is somewhat simi-lar in spirit to the X-tree [Berchtold et al.1996] which is a variant of an R-tree wherea node is not split upon overflow if toomuch overlap would result among its chil-dren (in which case, the node is termed asupernode).

7. CONCLUDING REMARKS

We have reviewed a number of hierarchi-cal image and object representations witha focus on hierarchical methods whoseuse enables us to answer the fundamen-tal queries of what and where. As we haveseen, there are two key classes of meth-ods, and we distinguished between themon the basis of whether they were image-based or object-based. To get the mostpower in terms of the queries that can behandled, we can either use object hierar-chies, which employ aggregates of bound-ing boxes, or space hierarchies, whichemploy a disjoint decomposition of the un-derlying space that is spanned by the ob-jects. Neither method can be considered as

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 52: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

210 H. Samet

being the best. Each method has its advan-tages and disadvantages.

The drawback of object hierarchies (ofwhich the R-tree is the most commonlyused example) is that they do not yielda disjoint decomposition of the underly-ing space. This leads to the multiple cov-erage problem in the sense that the areacontaining a particular location a in ob-ject o may be spanned by several R-treenodes while o is contained in just one R-tree node. Thus just because object o wasnot found in the search of one path in thetree whose bounding box contains a, doesnot mean that o would not be found in thesearch of another path in the tree. Thismakes search in an R-tree somewhat inef-ficient as in the worst case we may haveto examine all of the bounding boxes at alllevels of the hierarchy when attempting todetermine the identity of the object o thatcontains location a.

The conventional alternative solution isto make use of a disjoint decomposition ofthe underlying space as provided by manystructures such as the R+-tree and vari-ants of the quadtree/pyramid. These solu-tions do not suffer from the multiple cov-erage problem. However, their drawbackis that an object o may need to be de-composed into several pieces and hencereported as satisfying the query severaltimes as the area spanned by o may becontained in several blocks. For example,suppose that we want to retrieve all the ob-jects that overlap a particular region (i.e.,a window query) rather than a point. Inthis case, we could report the same objectas many times as it has been decomposedinto blocks. We can avoid reporting the ob-ject several times when using these meth-ods by removing the duplicate objects be-fore reporting the final answer. Removingthe duplicate objects usually requires in-vocation of some variant of a sorting algo-rithm. Interestingly, there has been somework in developing algorithms for certainclasses of objects and different data struc-tures which are based on a disjoint decom-position that avoid reporting duplicate ob-jects (e.g., Aref and Samet [1992, 1994b]and Dittrich and Seeger [2000]) withoutresorting to sorting.

The BV-tree [Freeston 1995] is an alter-native solution that makes use of an objecthierarchy similar to that of the R-tree anda more restricted form of a containment hi-erarchy where any pair of bounding boxesof two children a and b of node r must beeither disjoint or one child is completelycontained in the other child (i.e., a is inb or b is in a). At a first glance, the BV-tree would appear to also be afflicted bythe multiple coverage problem. However,the BV-tree overcomes the multiple cover-age problem by making use of the conceptof a guard which is carried along duringthe search process as the tree is descendedthereby ensuring that only one path is fol-lowed in any search. The drawback of theBV-tree is that it is not balanced althoughthe depth is bounded based on the maxi-mum number of data points or objects. Itis worth noting that the key idea in theBV-tree is the decoupling of the decompo-sition hierarchy from the directory hierar-chy (i.e., the manner in which the variousnodes are aggregated) [Samet 2004]. ThePK-tree [Wang et al. 1998] applies similarideas to image hierarchies, with the samedrawback of possibly being unbalanced.

We now point out a few more consid-erations which should be taken into ac-count. Object-based methods such as theR-tree and the R+-tree have the advantageof being able to distinguish between occu-pied and unoccupied space for a particu-lar data set. However, they cannot corre-late occupied space in two different datasets. In other words, the bounding boxesof the two data sets are not in registrationwhich means that more intersection op-erations must be performed between thetwo sets when executing operations suchas a spatial join if no preprocessing sort-ing step has been applied (e.g., Arge et al.[1998], Koudas and Sevcik [1997], andPatel and DeWitt [1996]), although a num-ber of good algorithms have been devisedfor spatial joins for object-based represen-tations (e.g., Lo and Ravishankar [1994,1996]). In contrast, disjoint image-basedmethods that make use of a regular de-composition of the underlying space suchas the region quadtree and the pyramidare good when operating on two different

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 53: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 211

data sets as the occupied space in the twosets is correlated thereby simplifying thespatial join algorithms which makes thempreferable to disjoint image-based meth-ods that do not employ regular decompo-sition such as the R+-tree (e.g., Hoel andSamet [1995]). Nevertheless, there is thecost of dealing with duplicate answers (asmentioned above) which is incurred re-gardless of which disjoint method is used.Thus there is no one best or optimal rep-resentation. Ultimately, users make theirdecision on the basis of what is impor-tant to them, possibly making use of costmodels (e.g., Aref and Samet [1994a] andTheodoridis et al. [2000]).

ACKNOWLEDGMENTS

I thank Houman Alborzi, Frantisek Brabec, and GisliR. Hjaltason for generating the screenshots of theresults of the applets and drawing the figures, andEdwin Jacox for help in the final preparation forpublication. I would also like to thank the anony-mous reviewers for their detailed comments whichgreatly sharpened the presentation of the ideas inthis article.

REFERENCES

ABEL, D. J. AND SMITH, J. L. 1983. A data structureand algorithm based on a linear key for a rectan-gle retrieval problem. Computer Vision, Graph-ics, and Image Processing 24, 1 (Oct.), 1–13.

AGGARWAL, C., WOLF, J., YU, P., AND EPELMAN, M.1997. The S-tree: An efficient index for mul-tidimensional objects. In Advances in Spa-tial Databases—5th International Symposium,SSD’97 (Berlin, Germany). M. Scholl andA. Voisard, Eds. 350–373.

ALBORZI, H. AND SAMET, H. 2004. Execution timeanalysis of a top-down r-tree construction algo-rithm. University of Maryland Computer Sci-ence TR-4623, College Park, MD.

ANG, C. H. AND TAN, T. C. 1997. New linear nodesplitting algorithm for R-trees. In Advances inSpatial Databases—5th International Sympo-sium, SSD’97 (Berlin, Germany). M. Scholl andA. Voisard, Eds. 339–349.

AREF, W. G. AND ILYAS, I. F. 2001. Sp-gist: An exten-sible database index for supporting space parti-tioning trees. Journal of Intelligent InformationSystems 17, 2–3 (Dec.), 215–240.

AREF, W. G. AND SAMET, H. 1990. Efficient process-ing of window queries in the pyramid data struc-ture. In Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles ofDatabase Systems (PODS) (Nashville, TN). 265–

272. Also Proceedings of the 5th Brazilian Sym-posium on Databases (Rio de Janeiro, Brazil,Apr.). 1990, 15–26.

AREF, W. G. AND SAMET, H. 1992. Uniquely report-ing spatial objects: Yet another operation forcomparing spatial data structures. In Proceed-ings of the 5th International Symposium on Spa-tial Data Handling (Charleston, SC). 178–189.

AREF, W. G. AND SAMET, H. 1994a. A cost model forquery optimization using r-trees. In Proceedingsof the 2nd ACM Workshop on Geographic Infor-mation Systems (Gaithersburg, MD). N. Pissinouand K. Makki, Eds. 60–67.

AREF, W. G. AND SAMET, H. 1994b. Hashing by prox-imity to process duplicates in spatial databases.In Proceedings of the 3rd International Confer-ence on Information and Knowledge Manage-ment (CIKM) (Gaithersburg, MD). 347–354.

ARGE, L. 1996. Efficient external-memory datastructures and applications. Ph.D. dissertation.Computer Science Department, University ofAarhus, Aarhus, Denmark. Also BRICS Disser-tation Series, DS-96-3.

ARGE, L., HINRICHS, K. H., VAHRENHOLD, J., AND VITTER,J. S. 1999. Efficient bulk operations on dy-namic R-trees. In Proceedings of the 1st Work-shop on Algorithm Engineering and Experi-mentation (ALENEX’99) (Baltimore, MD). M. T.Goodrich and C. C. McGeoch, Eds. 328–348.

ARGE, L., PROCOPIUC, O., RAMASWAMY, S., SUEL, T., AND

VITTER, J. S. 1998. Scalable sweeping-basedspatial join. In Proceedings of the 24th Inter-national Conference on Very Large Data Bases(VLDB) (New York). A. Gupta, O. Shmueli, andJ. Widom, Eds. 570–581.

ARMAN, R. AND AGGARWAL, J. K. 1993. Model-basedobject recognition in dense range images—A re-view. ACM Computing Surveys 25, 1 (Mar.), 5–43.

ASANO, T., RANJAN, D., ROOS, T., WELZL, E., AND WID-MAYER, P. 1997. Space-filling curves and theiruse in the design of geometric data structures.Theoretical Computer Science 181, 1 (July), 3–15.

BAIRD, H. S. 1976. Design of a family of algorithmsfor large scale integrated circuit mask artworkanalysis. Tech. Rep. PRRL-76-TR-062, RCA Lab-oratories, Princeton, NJ. May.

BALLARD, D. H. 1981. Strip trees: A hierarchicalrepresentation for curves. Communications ofthe ACM 24, 5 (May), 310–321. Also corrigen-dum, Communications of the ACM 25, 3 (Mar.),213, 1982.

BAUMGART, B. G. 1975. A polyhedron representa-tion for computer vision. In Proceedings of the1975 National Computer Conference. Vol. 44.Anaheim, CA, 589–596.

BECKER, B., FRANCIOSA, P. G., GSCHWIND, S., OHLER, T.,THIEMT, G., AND WIDMAYER, P. 1992. Enclosingmany boxes by an optimal pair of boxes. In Pro-ceedings of the 9th Annual Symposium on The-oretical Aspects of Computer Science (STACS)

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 54: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

212 H. Samet

(Cachan, France). A. Finkel and M. Jantzen,Eds. 475–486.

BECKMANN, N., KRIEGEL, H.-P., SCHNEIDER, R., AND

SEEGER, B. 1990. The R∗-tree: An efficient androbust access method for points and rectangles.In Proceedings of the ACM SIGMOD Conference(Atlantic City, NJ). 322–331.

BELL, S. B. M., DIAZ, B. M., HOLROYD, F., AND JACKSON,M. J. 1983. Spatially referenced methods ofprocessing raster and vector data. Image and Vi-sion Computing 1, 4 (Nov.), 211–220.

BENTLEY, J. L. 1975. Multidimensional binarysearch trees used for associative searching. Com-munications of the ACM 18, 9 (Sept.), 509–517.

BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P.1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd In-ternational Conference on Very Large Data Bases(VLDB) (Mumbai (Bombay), India). T. M. Vija-yaraman, A. P. Buchmann, C. Mohan, and N. L.Sarda, Eds. 28–39.

BOHM, C., BERCHTOLD, S., AND KEIM, D. A. 2001.Searching in high-dimensional spaces: Indexstructures for improving the performance ofmultimedia databases. ACM Comput. Surv. 33, 3(Sept.), 322–373.

BRABEC, F. AND SAMET, H. 1998. Visualizing and an-imating R-trees and spatial operations in spa-tial databases on the worldwide web. In Vi-sual Database Systems (VDB4), Y. Ioannidisand W. Klas, Eds. Chapman and Hall, London,United Kingdom, 123–140.

BRABEC, F., SAMET, H., AND YILMAZ, C. 2003. VASCO:Visualizing and animating spatial constructsand operations. In Proceedings of the 19th An-nual Symposium on Computational Geometry(San Diego, CA). 374–375.

BRINKHOFF, T., KRIEGEL, H.-P., SCHNEIDER, R., AND

SEEGER, B. 1994. Multi-step processing of spa-tial joins. In Proceedings of the ACM SIGMODConference (Minneapolis, MN). 197–208.

BRODSKY, A., LASSEZ, C., LASSEZ, J., AND MAHER, M. J.1995. Separability of polyhedra for optimal fil-tering of spatial and constraint data. In Pro-ceedings of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of DatabaseSystems (PODS) (San Jose, CA). 54–65.

BURT, P. J. 1980. Tree and pyramid structuresfor coding hexagonally sampled binary images.Computer Graphics and Image Processing 14, 3(Nov.), 271–280.

BURTON, F. W., KOLLIAS, J. G., AND ALEXANDRIDIS, N. A.1984. Implementation of the exponential pyra-mid data structure with application to deter-mination of symmetries in pictures. ComputerVision, Graphics, and Image Processing 25, 2(Feb.), 218–225.

CHAKRABARTI, K. AND MEHROTRA, S. 1998. High di-mensional feature indexing using hybrid trees.Tech. Rep. TR-MARS-98-14, Department of In-formation and Computer Science, University ofCalifornia, Irvine, CA. July.

CHAKRABARTI, K. AND MEHROTRA, S. 1999. The hy-brid tree: An index structure for high dimen-sional feature spaces. In Proceedings of the 15thIEEE International Conference on Data Engi-neering (Sydney, Australia). 440–447.

CHAUDHURI, S. AND DAYAL, U. 1997. Data ware-housing and olap technology. SIGMODRECORD 26, 1 (Mar.), 65–74.

CHEN, L., CHOUBEY, R., AND RUNDENSTEINER, E. A.1998. Bulk-insertions into R-trees using theSmall-Tree-Large-Tree approach. In Proceed-ings of the 6th International Symposium onAdvances in Geographic Information Systems(Washington, DC). R. Laurini, K. Makki, andN. Pissinou, Eds. 161–162.

CHOUBEY, R., CHEN, L., AND RUNDENSTEINER, E. A.1999. GBI: A generalized R-tree bulk-insertionstrategy. In Advances in Spatial Databases—6thInternational Symposium, SSD’99 (Hong Kong).R. H. Guting, D. Papadias, and F. H. Lochovsky,Eds. 91–108.

DE BERG, M., VAN KREVELD, M., OVERMARS, M., AND

SCHWARZKOPF, O. 1997. Computational Geom-etry: Algorithms and Applications. Springer-Verlag, Berlin, Germany.

DEMILLO, R. A., EISENSTAT, S. C., AND LIPTON, R. J.1978. Preserving average proximity in arrays.Communications of the ACM 21, 3 (Mar.), 228–231.

DENGEL, A. 1991. Self-adapting structuring andrepresentation of space. Tech. Rep. RR-91-22,Deutsches Forschungszentrum fur KunstlicheIntelligenz, Kaiserslautern, Germany. Sept.

DEWITT, D. J., KABRA, N., LUO, J., PATEL, J. M., AND

YU, J. B. 1994. Client-server paradise. In Pro-ceedings of the 20th International Conference onVery Large Data Bases (VLDB) (Santiago, Chile).J. Bocca, M. Jarke, and C. Zaniolo, Eds. 558–569.

DI PASQUALE, A., FORLIZZI, L., JENSEN, C. S.,MANOLOPOULOS, Y., NARDELLI, E., PFOSER, D.,PROIETTI, G., SALTENIS, S., THEODORIDIS, Y.,TZOURAMANIS, T., AND VASSILAKOPOULOS, M. 2003.Acces methods and query processing tech-niques. In Spatiotemporal Databases: TheCHOROCHRONOS Approach (Berlin, Ger-many). M. Koubarakis, T. K. Sellis, A. U. Frank,S. Grumbach, R. H. Guting, C. S. Jensen,N. A. Lorentzos, Y. Manolopoulos, E. Nardelli, B.Pernici, H.-J. Schek, M. Scholl, B. Theodoulidis,and N. Tryfona, Eds. Chap. 6, 203–261.

DILLENCOURT, M. B., SAMET, H., AND TAMMINEN,M. 1992. A general approach to connected-component labeling for arbitrary image repre-sentations. Journal of the ACM 39, 2 (Apr.), 253–280. Also Corrigendum, Journal of the ACM,39(4):985, October 1992; University of MarylandComputer Science TR-2303.

DITTRICH, J.-P. AND SEEGER, B. 2000. Data redun-dancy and duplicate detection in spatial join pro-cessing. In Proceedings of the 16th IEEE Inter-national Conference on Data Engineering. (SanDiego, CA). 535–546.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 55: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 213

DORI, D. AND BEN-BASSAT, M. 1983. Circumscribinga convex polygon by a polygon of fewer sides withminimal area addition. Computer Vision, Graph-ics, and Image Processing 24, 2 (Nov.), 131–159.

DOUGLAS, D. H. 1990. It makes me so CROSS. InIntroductory Readings in Geographic Informa-tion Systems, D. J. Peuquet and D. F. Marble,Eds. Taylor & Francis, London, United King-dom, Chapter 21, 303–307. (Originally pub-lished as Internal Memorandum, Harvard Uni-versity Laboratory for Computer Graphics andSpatial Analysis, 1974.)

DOUGLAS, D. H. AND PEUCKER, T. K. 1973. Algo-rithms for the reduction of the number of pointsrequired to represent a digitized line or its cari-cature. The Canadian Cartographer 10, 2 (Dec.),112–122.

DYER, C. R. 1981. A VLSI pyramid machine for hi-erarchical parallel image processing. In Proceed-ings of the IEEE Conference on Pattern Recogni-tion and Image Processing’81 (Dallas, TX). 381–386.

DYER, C. R. 1982. Pyramid algorithms and ma-chines. Multicomputers and Image ProcessingAlgorithms and Programs, 409–420.

ESPERANCA, C. AND SAMET, H. 1994. Representingorthogonal multidimensional objects by ver-tex lists. In Aspects of Visual Form Process-ing, C. Arcelli, L. P. Cordella, and G. S.di Baja, Eds. World Scientific, Singapore, 209–220.

ESPERANCA, C. 1995. Orthogonal objects and theirapplication in spatial databases. Ph.D. thesis,Computer Science Department, University ofMaryland, College Park, MD. Also ComputerScience TR-3566.

ESPERANCA, C. AND SAMET, H. 1997. Orthogonalpolygons as bounding structures in filter-refinequery processing strategies. In Advances inSpatial Databases—5th International Sympo-sium, SSD’97 (Berlin, Germany). M. Scholl andA. Voisard, Eds. 197–220.

FALOUTSOS, C. AND ROSEMAN, S. 1989. Fractals forsecondary key retrieval. In Proceedings of the8th ACM SIGACT-SIGMOD-SIGART Sympo-sium on Principles of Database Systems (PODS)(Philadelphia, PA). 247–252.

FALOUTSOS, C., SELLIS, T., AND ROUSSOPOULOS, N. 1987.Analysis of object oriented spatial access meth-ods. In Proceedings of the ACM SIGMOD Con-ference. (San Francisco, CA). 426–439.

FINKEL, R. A. AND BENTLEY, J. L. 1974. Quad trees:A data structure for retrieval on composite keys.Acta Informatica 4, 1, 1–9.

FOLEY, J. D., VAN DAM, A., FEINER, S. K., AND

HUGHES, J. F. 1990. Computer Graphics: Prin-ciples and Practice, Second ed. Addison-Wesley,Reading, MA.

FRANKLIN, W. R. 1984. Adaptive grids for geometricoperations. Cartographica 21, 2&3 (Summer &Autumn), 160–167.

FREEMAN, H. 1974. Computer processing of line-drawing images. ACM Computing Surveys 6, 1(Mar.), 57–97.

FREESTON, M. 1995. A general solution of the n-dimensional B-tree problem. In Proceedings ofthe ACM SIGMOD Conference. (San Jose, CA).80–91.

FRIEDMAN, J. H., BENTLEY, J. L., AND FINKEL, R. A.1977. An algorithm for finding best matches inlogarithmic expected time. ACM Transactions onMathematical Software 3, 3 (Sept.), 209–226.

FUCHS, H., KEDEM, Z. M., AND NAYLOR, B. F. 1980.On visible surface generation by a priori treestructures. Computer Graphics 14, 3 (July), 124–133. Also Proceedings of the SIGGRAPH’80 Con-ference (Seattle, WA, July).

GAEDE, V. 1995. Optimal redundancy in spa-tial database systems. In Advances in Spa-tial Databases—4th International Symposium,SSD’95 (Portland, ME). M. J. Egenhofer and J. R.Herring, Eds. 96–116.

GARCıA, Y. J., LOPEZ, M. A., AND LEUTENEGGER, S. T.1998a. A greedy algorithm for bulk loading R-trees. In Proceedings of the 6th InternationalSymposium on Advances in Geographic Infor-mation Systems (Washington, DC). R. Laurini,K. Makki, and N. Pissinou, Eds. 163–164. (Alsoextended version University of Denver, Com-puter Science Department Technical Report 97-02.)

GARCıA, Y. J., LOPEZ, M. A., AND LEUTENEGGER, S. T.1998b. On optimal node splitting for R-trees. InProceedings of the 24th International Conferenceon Very Large Data Bases (VLDB) (New York).A. Gupta, O. Shmueli, and J. Widom, Eds. 334–344.

GARCIA, Y. J., LOPEZ, M. A., AND LEUTENEGGER, S. T.1999. Post-optimization and incremental re-finement of r-trees. In Proceedings of the 7thInternational Symposium on Advances in Geo-graphic Information Systems (Kansas City, MO).C. B. Medeiros, Ed. 91–96.

GARGANTINI, I. 1982. An effective way to representquadtrees. Communications of the ACM 25, 12(Dec.), 905–910.

GOTTSCHALK, S., LIN, M. C., AND MANOCHA, D. 1996.OBBTree: A hierarchical structure for rapid in-terference detection. In Proceedings of the SIG-GRAPH’96 Conference (New Orleans, LA). 171–180.

GRAY, F. 1953. Pulse code communication. UnitedStates Patent Number 2632058 March 17, 1953.

GREENE, D. 1989. An implementation and perfor-mance analysis of spatial data access methods.In Proceedings of the 5th IEEE InternationalConference on Data Engineering. Los Angeles,606–615.

GUIBAS, L. J. AND STOLFI, J. 1985. Primitives for themanipulation of general subdivisions and thecomputation of Voronoi diagrams. ACM Trans.Graphics 4, 2 (Apr.), 74–123. (Also Proceedingsof the 15th Annual ACM Symposium on the

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 56: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

214 H. Samet

Theory of Computing, pages 221–234, Boston,April 1983.)

GUNTHER, O. 1987. Efficient structures for geomet-ric data management. Ph.D. dissertation, Uni-versity of California at Berkeley, Berkeley, CA.(Also Lecture Notes in Computer Science 337,Springer-Verlag, Berlin, West Germany, 1988;UCB/ERL M87/77.)

GUNTHER, O. AND BILMES, J. 1991. Tree-based ac-cess methods for spatial databases: Implemen-tation and performance evaluation. IEEE Trans-actions on Knowledge and Data Engineering 3, 3(Sept.), 342–356. (Also Department of ComputerScience TRCS88-23, University of California atSanta Barbara, Santa Barbara, CA.)

GUNTHER, O. AND NOLTEMEIER, H. 1991. Spatialdatabase indices for large extended objects. InProceedings of the 7th IEEE International Con-ference on Data Engineering (Kobe, Japan). 520–526.

GUTTMAN, A. 1984. R-trees: A dynamic index struc-ture for spatial searching. In Proceedings of theACM SIGMOD Conference (Boston, MA). 47–57.

HELLERSTEIN, J. M., NAUGHTON, J. F., AND PFEFFER, A.1995. Generalized search trees for databasesystems. In Proceedings of the 21st InternationalConference on Very Large Data Bases (VLDB),U. Dayal, P. M. D. Gray, and S. Nishio, Eds.Zurich, Switzerland, 562–573.

HENRICH, A. 1998. The LSDh-tree: An access struc-ture for feature vectors. In Proceedings of the14th IEEE International Conference on Data En-gineering. Orlando, FL, 362–369.

HENRICH, A., SIX, H. W., AND WIDMAYER, P. 1989. TheLSD tree: Spatial access to multidimensionalpoint and non-point data. In Proceedings of the15th International Conference on Very LargeDatabases (VLDB) (Amsterdam, The Nether-lands). P. M. G. Apers and G. Wiederhold, Eds.45–53.

HILBERT, D. 1891. Ueber stetige abbildung einerlinie auf flachenstuck. Mathematische An-nalen 38, 459–460.

HOEL, E. G. AND SAMET, H. 1992. A qualitativecomparison study of data structures for largeline segment databases. In Proceedings of theACM SIGMOD Conference (San Diego, CA), M.Stonebraker, Ed. 205–214.

HOEL, E. G. AND SAMET, H. 1995. Benchmarkingspatial join operations with spatial output. InProceedings of the 21st International Confer-ence on Very Large Data Bases (VLDB) (Zurich,Switzerland). U. Dayal, P. M. D. Gray, andS. Nishio, Eds. 606–618.

HUBBARD, P. M. 1995. Collision detection for in-teractive computer graphics applications. IEEETransactions on Visualization and ComputerGraphics 1, 3 (Sept.), 218–230.

HUBBARD, P. M. 1996. Approximating polyhedrawith spheres for time-critical collision detection.ACM Trans. Graphics 15, 3 (July), 179–210.

HUNTER, G. M. 1978. Efficient computation anddata structures for graphics. Ph.D. thesis, De-partment of Electrical Engineering and Com-puter Science, Princeton University, Princeton,NJ.

ICHIKAWA, T. 1981. A pyramidal representation ofimages and its feature extraction facility. IEEETransactions on Pattern Analysis and MachineIntelligence 3, 3 (May), 257–264.

JAGADISH, H. V. 1990a. Linear clustering of objectswith multiple attributes. In Proceedings of theACM SIGMOD Conference (Atlantic City, NJ).332–342.

JAGADISH, H. V. 1990b. Spatial search with polyhe-dra. In Proceedings of the 6th IEEE InternationalConference on Data Engineering (Los Angeles,CA). 311–319.

JOY, K. I., LEGAKIS, J., AND MACCRACKEN, R. 2003.Data structures for multiresolution represen-tation of unstructured meshes. In Hierarchicaland Geometrical Methods in Scientific Visualiza-tion, G. Farin, B. Hamman, and H. Hagen, Eds.Springer-Verlag, Berlin, Germany, 143–170.

KAMEL, I. AND FALOUTSOS, C. 1993. On packingR-trees. In Proceedings of the 2nd InternationalConference on Information and Knowledge Man-agement (CIKM) (Washington, DC). 490–499.

KAMEL, I. AND FALOUTSOS, C. 1994. Hilbert R-tree:An improved R-tree using fractals. In Proceed-ings of the 20th International Conference onVery Large Data Bases (VLDB) (Santiago, Chile).J. Bocca, M. Jarke, and C. Zaniolo, Eds. 500–509.

KAMEL, I., KHALIL, M., AND KOURAMAJIAN, V. 1996.Bulk insertion in dynamic R-trees. In Pro-ceedings of the 7th International Symposiumon Spatial Data Handling, M. J. Kraak andM. Molenaar, Eds. International GeographicalUnion Commission on Geographic InformationSystems, Association for Geographical Informa-tion, Delft, The Netherlands, 3B.31–3B.42.

KATAYAMA, N. AND SATOH, S. 1997. The SR-tree:An index structure for high-dimensional near-est neighbor queries. In Proceedings of the ACMSIGMOD Conference (Tucson, AZ). J. Peckham,Ed. 369–380.

KLINGER, A. 1971. Patterns and search statis-tics. In Optimizing Methods in Statistics, J. S.Rustagi, Ed. Academic Press, New York, 303–337.

KLOSOWSKI, J. T., HELD, M., MITCHELL, J. S. B.,SOWIZRAL, H., AND ZIKAN, K. 1998. Efficient col-lision detection using bounding volume hierar-chies of k-DOPs. IEEE Transactions on Visual-ization and Computer Graphics 4, 1 (Jan.), 21–36.

KNOWLTON, K. 1980. Progressive transmission ofgrey-scale and binary pictures by simple effi-cient, and lossless encoding schemes. Proceed-ings of the IEEE 68, 7 (July), 885–896.

KNUTH, D. E. 1997. The Art of Computer Program-ming: Fundamental Algorithms, Third ed. Vol. 1.Addison-Wesley, Reading, MA.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 57: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 215

KNUTH, D. E. 1998. The Art of Computer Program-ming: Sorting and Searching, Second ed. Vol. 3.Addison-Wesley, Reading, MA.

KOUDAS, N. AND SEVCIK, K. C. 1997. Size separationspatial join. In Proceedings of the ACM SIGMODConference (Tucson, AZ). J. Peckham, Ed. 324–335.

KRIEGEL, H.-P., HORN, H., AND SCHIWIETZ, M. 1991.The performance of object decomposition tech-niques for spatial query processing. In Ad-vances in Spatial Databases—2nd Symposium,SSD’91, O. Gunther and H.-J. Schek, Eds.Zurich, Switzerland, 257–276.

LEA, D. 1988. Digital and hilbert k-d trees. Infor-mation Processing Letters 27, 1 (Feb.), 35–41.

LEE, Y.-J. AND CHUNG, C.-W. 2001. The DR-tree: Amain memory data structure for complex multi-dimensional objects. Geoinformatica 5, 2 (June),181–207.

LEUTENEGGER, S. T., LOPEZ, M. A., AND EDGINGTON, J.1997. STR: A simple and efficient algorithm forR-tree packing. In Proceedings of the 13th IEEEInternational Conference on Data Engineering,A. Gray and P.-A. Larson, Eds. Birmingham,United Kingdom, 497–506.

LIU, X. AND SCHRACK, G. F. 1998. A new orderingstrategy applied to spatial data processing. Inter-national Journal Geographical Information Sci-ence 12, 1, 3–22.

LO, M.-L. AND RAVISHANKAR, C. V. 1994. Spatialjoins using seeded trees. In Proceedings of theACM SIGMOD Conference (Minneapolis, MN).209–220.

LO, M.-L. AND RAVISHANKAR, C. V. 1996. Spatialhash-joins. In Proceedings of the ACM SIGMODConference (Montreal, Ont., Canada). 247–258.

MEAGHER, D. 1982. Geometric modeling using oc-tree encoding. Computer Graphics and ImageProcessing 19, 2 (June), 129–147.

MILLER, R. AND STOUT, Q. F. 1985. Pyramid com-puter algorithms for determining geomet-ric properties of images. In Proceedings ofthe Symphosium on Computational Geometry.Baltimore, MD, 263–269.

MOITRA, A. 1993. Spatio-temporal data manage-ment using R-trees. In Proceedings of the ACMWorkshop on Advances in Geographic Informa-tion Systems, N. Pissinou, Ed. Arlington, VA, 28–33.

MORTON, G. M. 1966. A computer oriented geodeticdata base and a new technique in file sequencing.IBM Ltd., Ottawa, Canada.

NIEVERGELT, J., HINTERBERGER, H., AND SEVCIK, K. C.1984. The grid file: An adaptable, symmetricmultikey file structure. ACM Transactions onDatabase Systems 9, 1 (Mar.), 38–71.

OMOHUNDRO, S. M. 1989. Five balltree constructionalgorithms. Tech. Rep. TR-89-063, InternationalComputer Science Institute, Berkeley, CA. Dec.

ORENSTEIN, J. A. AND MERRETT, T. H. 1984. A classof data structures for associative searching. In

Proceedings of the 3rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of DatabaseSystems (PODS) (Waterloo, Ontario, Canada).181–190.

O’ROURKE, J. AND SLOAN JR., K. R. 1984. Dynamicquantization: Two adaptive data structures formultidimensional spaces. IEEE Transactions onPattern Analysis and Machine Intelligence 6, 3(May), 266–280.

OTTMANN, T. AND WOOD, D. 1980. 1-2 brother treesor AVL trees revisited. Computer Journal 23, 3(Aug.), 248–255.

PARK, C. M. AND ROSENFELD, A. 1971. Connectivityand genus in three dimensions. Computer Sci-ence TR-156, University of Maryland, CollegePark, MD. May.

PATEL, J. M. AND DEWITT, D. J. 1996. Partitionbased spatial-merge join. In Proceedings ofthe ACM SIGMOD Conference (Montreal, Ont.,Canada). 259–270.

PEANO, G. 1890. Sur une courbe qui remplit touteune aire plaine. Mathematische Annalen 36,157–160.

PREPARATA, F. P. AND SHAMOS, M. I. 1985. Compu-tational Geometry: An Introduction. Springer–Verlag, New York.

REDDY, D. R. AND RUBIN, S. 1978. Representationof three–dimensional objects. Computer Sci-ence Department CMU–CS–78–113, Carnegie–Mellon University, Pittsburgh. Apr.

RIGAUX, P., SCHOLL, M., AND VOISARD, A. 2001. Spa-tial Database Management Systems: Applica-tions to GIS. Morgan-Kaufmann, San Francisco.

ROBINSON, J. T. 1981. The K-D-B-tree: A searchstructure for large multidimensional dynamicindexes. In Proceedings of the ACM SIGMODConference. Ann Arbor, MI, 10–18.

ROSENFELD, A. 1980. Quadtrees and pyramids. InProceedings of the 5th International Conferenceon Pattern Recognition. Miami Beach, 802–811.

ROSENFELD, A. 1984. Some useful properties ofpyramids. In Multiresolution Image Processingand Analysis, A. Rosenfeld, Ed. Springer-Verlag,Berlin, West Germany, 2–5.

ROSENFELD, A. AND PFALTZ, J. L. 1966. Sequentialoperations in digital picture processing. Journalof the ACM 13, 4 (Oct.), 471–494.

ROSS, K. A., SITZMANN, I., AND STUCKEY, P. J. 2001.Cost-based unbalanced R-trees. In Proceedingsof the 13th International Conference on Scientificand Statistical Database Management. Fairfax,VA, 203–212.

ROUSSOPOLOS, N., KOTIDIS, Y., AND ROUSSOPOLOS, M.1997. Cubetree: Organization of and bulk in-cremental updates on the data cube. In Pro-ceedings of the ACM SIGMOD Conference, J.Peckham, Ed. Tucson, AZ, 89–111.

ROUSSOPOULOS, N. AND LEIFKER, D. 1985. Direct spa-tial search on pictorial databases using packedR-trees. In Proceedings of the ACM SIGMODConference. Austin, TX, 17–31.

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 58: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

216 H. Samet

RUTOVITZ, D. 1968. Data structures for operationson digital images. In Pictorial Pattern Recogni-tion, G. C. Cheng, R. S. Ledley, D. K. Pollock,and A. Rosenfeld, Eds. Thompson Book Co.,Washington, DC, 105–133.

SAALFELD, A. 1987. It doesn’t make me nearly asCROSS, some advantages of the point-vectorrepresentation of line segments in automatedcartography. International Journal of Geograph-ical Information Systems 1, 4, 379–386.

SAGAN, H. 1994. Space-Filling Curves. Springer-Verlag, New York.

SAKURAI, Y., YOSHIKAWA, M., UEMURA, S., AND KOJIMA,H. 2000. The A-tree: An index structure forhigh-dimensional spaces using relative approx-imation. In Proceedings of the 26th Interna-tional Conference on Very Large Data Bases(VLDB) (Cairo, Egypt). A. El Abbadi, M. L.Brodie, S. Chakravarthy, U. Dayal, N. Kamel,G. Schlageter, and K.-Y. Whang, Eds. 516–526.

SAMET, H. 1990a. Applications of Spatial DataStructures: Computer Graphics, Image Process-ing, and GIS. Addison-Wesley, Reading, MA.

SAMET, H. 1990b. The Design and Analysis of Spa-tial Data Structures. Addison-Wesley, Reading,MA.

SAMET, H. 1995. Spatial data structures. In Mod-ern Database Systems, The Object Model, Inter-operability and Beyond, W. Kim, Ed. ACM Pressand Addison-Wesley, New York, 361–385.

SAMET, H. 2004. Decoupling partitioning andgrouping: Overcoming shortcomings of spatialindexing with bucketing. ACM Trans. DatabaseSyst. 29, 4 (Dec.). Also University of MarylandComputer Science TR-4526.

SAMET, H. 2005. Foundations of Multidimensionaland Metric Data Structures. To appear.

SAMET, H. AND AREF, W. G. 1995. Spatial data mod-els and query processing. In Modern DatabaseSystems, The Object Model, Interoperability andBeyond, W. Kim, Ed. ACM Press and Addison-Wesley, New York, 338–360.

SAMET, H. AND TAMMINEN, M. 1988. Efficient compo-nent labeling of images of arbitrary dimensionrepresented by linear bintrees. IEEE Transac-tions on Pattern Analysis and Machine Intelli-gence 10, 4 (July), 579–586.

SCHIWIETZ, M. 1993. Organization and query pro-cessing of spatial objects. Ph.D. thesis, Ludwig-Maximilians Universitat Munchen, Munich,Germany. In German.

SCHIWIETZ, M. AND KRIEGEL, H.-P. 1993. Query pro-cessing of spatial objects: Complexity versus re-dundancy. In Advances in Spatial Databases—3rd International Symposium, SSD’93, D. Abeland B. C. Ooi, Eds. Singapore, 377–396.

SCHNEIDER, R. AND KRIEGEL, H.-P. 1991. The TR∗-tree: A new representation of polygonal objectssupporting spatial queries and operations. InComputational Geometry—Methods, Algorithms

and Applications, H. Bieri and H. Noltemeier,Eds. Bern, Switzerland, 249–264.

SCHRACK, G. AND LIU, X. 1995. The spatial u-orderand some of its mathematical characteristics.In Proceedings of the Pacific Rim Conference onCommunications, Computers, and Signal Pro-cessing (Victoria, British Columbia, Canada).416–419.

SELLIS, T., ROUSSOPOULOS, N., AND FALOUTSOS, C.1987. The R+–tree: A dynamic index formulti-dimensional objects. In Proceedings of the13th International Conference on Very LargeDatabases (VLDB), P. M. Stocker and W. Kent,Eds. Brighton, United Kingdom, 71–79. (AlsoComputer Science TR-1795, University of Mary-land, College Park, MD.)

SHAFFER, C. A. AND SAMET, H. 1987. An in–core hi-erarchical data structure organization for a ge-ographic database. Computer Science Depart-ment TR 1886, University of Maryland, CollegePark, MD. July.

SHAMOS, M. I. AND HOEY, D. 1976. Geometric in-tersection problems. In Proceedings of the 17thIEEE Annual Symposium on Foundations ofComputer Science (Houston, TX). 208–215.

SHEKHAR, S. AND CHAWLA, S. 2003. SpatialDatabases: A Tour. Prentice-Hall, EnglewoodCliffs, NJ.

SLOAN JR., K. R. 1981. Dynamically quantizedpyramids. In Proceedings of the 6th Interna-tional Joint Conference on Artificial Intelligence(Vancouver, Canada). 734–736.

SOLNTSEFF, N. AND WOOD, D. 1977. Pyramids: Adata type for matrix representation in PASCAL.BIT 17, 3, 344–350.

SRIHARI, S. N. 1984. Pyramid representation forsolids. Information Sciences 34, 25–46.

STONEBRAKER, M., SELLIS, T., AND HANSON, E. 1986.An analysis of rule indexing implementations indata base systems. In Proceedings of the 1st In-ternational Conference on Expert Database Sys-tems. Charleston, SC, 353–364.

SUETENS, P., FUA, P., AND HANSON, A. J. 1992. Com-putational strategies for object recognition. ACMComputing Surveys 24, 1 (Mar.), 5–61.

TAMMINEN, M. 1984. Comment on quad- and oct-trees. Communications of the ACM 27, 3 (Mar.),248–249.

TANIMOTO, S. L. AND PAVLIDIS, T. 1975. A hierarchi-cal data structure for picture processing. Com-puter Graphics and Image Processing 4, 2 (June),104–119.

THEODORIDIS, Y. AND SELLIS, T. 1993. Optimiza-tion issues in R-tree construction. Knowl-edge and Database Systems Laboratory ReportKDBSLAB-TR-93-08, National Technical Uni-versity of Athens, Athens, Greece. Oct.

THEODORIDIS, Y. AND SELLIS, T. 1994. Optimiza-tion issues in R-tree construction. In IGIS’94:Geographic Information Systems, InternationalWorkshop on Advanced Research in Geographic

ACM Computing Surveys, Vol. 36, No. 2, June 2004.

Page 59: Object-Based and Image-Based Object …hjs/pubs/sametcsur04.pdfObject-Based and Image-Based Object Representations 161 a cell contains the identity of the relevant object (or objects),

Object-Based and Image-Based Object Representations 217

Information Systems, J. Nievergelt, T. Roos, H.-J.Schek, and P. Widmayer, Eds. Monte Verita, As-cona, Switzerland, 270–273.

THEODORIDIS, Y., STEFANAKIS, E., AND SELLIS, T. K.2000. Efficient cost models for spatial queriesusing R-trees. IEEE Transactions on Knowledgeand Data Engineering 12, 1 (January/February),19–32.

TOMLIN, C. D. 1990. Geographic information sys-tems and cartographic modelling. Prentice Hall,Englewood Cliffs, NJ.

TROPF, H. AND HERZOG, H. 1981. Multidimensionalrange search in dynamically balanced trees.Angewandte Informatik 23, 2 (Feb.), 71–77.

VAN DEN BERCKEN, J., BLOHSFELD, B., DITTRICH, J.-P.,KRAMER, J., SCHAFER, T., SCHNEIDER, M., AND

SEEGER, B. 2001. XXL—a library approachto supporting efficient implementations of ad-vanced database queries. In Proceedings of the27th International Conference on Very LargeData Bases (VLDB), P. M. G. Apers, P. Atzeni,S. Ceri, S. Paraboschi, K. Ramamohanarao,and R. T. Snodgrass, Eds. Roma, Italy, 39–48.

VAN DEN BERCKEN, J., SEEGER, B., AND WIDMAYER, P.1997. A generic approach to bulk loading mul-tidimensional index structures. In Proceedingsof the 23rd International Conference on VeryLarge Data Bases (VLDB), M. Jarke, M. J. Carey,K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos,and M. A. Jeusfeld, Eds. Athens, Greece, 406–415.

VAN OOSTEROM, P. 1990. Reactive data structuresfor geographic information systems. Ph.D. the-

sis, Department of Computer Science, LeidenUniversity, Leiden, The Netherlands.

VAN OOSTEROM, P. AND CLAASSEN, E. 1990. Orienta-tion insensitive indexing methods for geometricobjects. In Proceedings of the 4th InternationalSymposium on Spatial Data Handling. Vol. 2.Zurich, Switzerland, 1016–1029.

WANG, W., YANG, J., AND MUNTZ, R. 1998. PK-tree:A spatial index structure for high dimensionalpoint data. In Proceedings of the 5th Interna-tional Conference on Foundations of Data Orga-nization and Algorithms (FODO), K. Tanaka andS. Ghandeharizadeh, Eds. Kobe, Japan, 27–36.

WANG, W., YANG, J., AND MUNTZ, R. R. 1997. STING:A statistical information grid approach to spa-tial data mining. In Proceedings of the 23rdInternational Conference on Very Large DataBases (VLDB), M. Jarke, M. J. Carey, K. R. Dit-trich, F. H. Lochovsky, P. Loucopoulos, and M. A.Jeusfeld, Eds. Athens, Greece, 186–195.

WHITE, D. A. AND JAIN, R. 1996a. Algorithms andstrategies for similarity retrieval. Technical Re-port VCL-96-101, Visual Computing Laboratory,University of California, San Diego, CA. (seehttp://vision.ucsd.edu/papers/simret).

WHITE, D. A. AND JAIN, R. 1996b. Similarity index-ing with the SS-tree. In Proceedings of the 12thIEEE International Conference on Data Engi-neering (New Orleans, LA). S. Y. W. Su, Ed. 516–523.

WHITE, M. 1982. N-trees: Large ordered indexesfor multi-dimensional space. Statistical researchdivision, US Bureau of the Census, Washington,DC.

Received January 2000; accepted August 2004

ACM Computing Surveys, Vol. 36, No. 2, June 2004.