An Approach to Indexing Databases of Solid...

23
An Approach to Indexing Databases of Solid Models David McWherter William C. Regli Geometric and Intelligent Computing Laboratory Department of Mathematics and Computer Science Drexel University 3141 Chestnut Street Philadelphia, PA 19104 http://gicl.mcs.drexel.edu/ Abstract Computer-Aided Designs and Solid Models are a media of immense importance across a wide spectrum of en- gineering design and manufacturing disciplines. Computer-Aided Design (CAD) data and the engineers who use it, have unique data access constraints and requirements. Previous work from multimedia databases has developed a repertoire of technologies for similarly-based and semantic indexing conventional multimedia data (e.g., images, video, audio). While great strides in management of conventional media, little work exists on how to perform content- based indexing, clustering and mining of engineering data, CAD data and solid models. This paper presents our approach and preliminary experimental results toward a general indexing scheme for solid models of mechanical designs. To achieve this, we create a mapping of solid model boundary representations and engineering attributes into Model Signature Graphs. We employ spectral graph theoretic techniques to project the model signature graphs into Model Comparison Spaces—high-dimensional vector spaces which can be used to compute metric distances among CAD models. These distance measures, called EigenDistances, are used to create a spatial index of the model comparison space using an -tree data structure. We develop heuristics that exploit properties of the model comparison space to ensure that the -tree indexes remain balanced. In order to validate our approach, we applied this methodology to the contents of the National Design Repository (http://www.designrepository.org ). The National Design Repository is a large public collection of over 55,000 solid models from a variety of real-world Computer-Aided Design domains. First, we assess the quality of the EigenDistance measure in pairwise comparison of solid models. Second, we evaluate the -tree indexes with a sampling of range queries using real industrial CAD data. Our implementation testbed is based on the PostgreSQL RDBMS. We believe that this work is of significance to both the database and engineering communities: we provide a novel approach to manage a unique media type and answer queries of practical importance to engineering design and manufacturing enterprises. 1 Introduction Computer-Aided Design media and Solid Models are of immense importance across the full spectrum of engineer- ing disciplines. CAD data, and the engineering consumers of CAD data, have unique data access constraints and requirements: large individual database elements (e.g., 50 megabyte boundary representation models for individual assembly elements); multi-disciplinary components (e.g., mechanical, electrical, etc.) with unique access criteria; and the lack of an accepted set of readily identifiable features that can be used for model indexing and clustering. While the database and data mining communities have made great strides in management of image, spatial, audio and video URL: http://www.mcs.drexel.edu/ udmcwher; Email: [email protected]. URL: http://www.mcs.drexel.edu/ regli; Email: [email protected]. 1

Transcript of An Approach to Indexing Databases of Solid...

Page 1: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

An Approachto Indexing Databasesof Solid Models

David McWherter�

William C. Regli�

GeometricandIntelligentComputingLaboratoryDepartmentof MathematicsandComputerScience

Drexel University3141ChestnutStreet

Philadelphia,PA 19104http://gicl.mcs.drexel.edu/

Abstract

Computer-Aided DesignsandSolid Modelsarea mediaof immenseimportanceacrossa wide spectrumof en-gineeringdesignandmanufacturingdisciplines. Computer-Aided Design(CAD) dataand the engineerswho useit, have uniquedataaccessconstraintsandrequirements.Previous work from multimediadatabaseshasdevelopeda repertoireof technologiesfor similarly-basedandsemanticindexing conventionalmultimediadata(e.g.,images,video,audio).While greatstridesin managementof conventionalmedia,little work existsonhow to performcontent-basedindexing, clusteringandmining of engineeringdata,CAD dataandsolidmodels.

This paperpresentsour approachand preliminary experimentalresultstoward a generalindexing schemeforsolid modelsof mechanicaldesigns.To achieve this, we createa mappingof solid modelboundaryrepresentationsandengineeringattributesinto Model Signature Graphs. We employ spectralgraphtheoretictechniquesto projectthemodelsignaturegraphsinto Model ComparisonSpaces—high-dimensionalvectorspaceswhich canbe usedtocomputemetricdistancesamongCAD models.Thesedistancemeasures,calledEigenDistances, areusedto createa spatialindex of the modelcomparisonspaceusingan � -treedatastructure. We develop heuristicsthat exploitpropertiesof themodelcomparisonspaceto ensurethatthe � -treeindexesremainbalanced.

In orderto validateourapproach,weappliedthismethodologyto thecontentsof theNationalDesignRepository(http://www.designrepository.org). TheNationalDesignRepositoryis a largepubliccollectionof over55,000solid modelsfrom a variety of real-world Computer-Aided Designdomains.First, we assessthequality oftheEigenDistancemeasurein pairwisecomparisonof solid models.Second,we evaluatethe � -treeindexeswith asamplingof rangequeriesusingreal industrialCAD data.Our implementationtestbedis basedon thePostgreSQLRDBMS.We believe thatthis work is of significanceto boththedatabaseandengineeringcommunities:we providea novel approachto managea uniquemediatypeandanswerqueriesof practicalimportanceto engineeringdesignandmanufacturingenterprises.

1 Introduction

Computer-Aided DesignmediaandSolid Modelsareof immenseimportanceacrossthe full spectrumof engineer-ing disciplines. CAD data,and the engineeringconsumersof CAD data,have uniquedataaccessconstraintsandrequirements:large individual databaseelements(e.g.,50 megabyteboundaryrepresentationmodelsfor individualassemblyelements);multi-disciplinarycomponents(e.g.,mechanical,electrical,etc.)with uniqueaccesscriteria;andthe lack of anacceptedsetof readily identifiablefeaturesthatcanbeusedfor modelindexing andclustering.Whilethedatabaseanddatamining communitieshavemadegreatstridesin managementof image,spatial,audioandvideo�

URL: http://www.mcs.drexel.edu/ � udmcwher; Email: [email protected].�URL: http://www.mcs.drexel.edu/ � regli; Email: [email protected].

1

Page 2: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

(a) MachinedPumpCover (155KB) (b) CastPumpHousing(242KB)

(c) MachinedChamberCover (153KB) (d) DoE TeamBenchmarkPart (89KB)

Figure1: CommonExamplesfrom theDesignRepository:3D solid modelsof machinedparts(someshown aswire-framesto revealhiddentopologicaldetails)aswell asmechanicalassemblies.

2

Page 3: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

media,little work existson how to performengineeringcontent-basedindexing, clusteringandmining of CAD dataandsolidmodels(suchasthoseshown in Figure1).

This paperpresentsour approachand preliminary experimentalresultstoward a generalindexing schemeforsolid modelsof mechanicaldesigns.To achieve this, we createa mappingof solid modelboundaryrepresentationsandengineeringattributesinto Model Signature Graphs. We employ spectralgraphtheoretictechniquesto projectthe modelsignaturegraphsinto Model ComparisonSpaces—high-dimensionalvectorspaceswhich canbe usedtocomputemetric distancesamongCAD models.Thesedistancemeasures,calledEigenDistances, areusedto createa spatial index of the modelcomparisonspaceusingan � -treedatastructure. We develop heuristicsthat exploitpropertiesof themodelcomparisonspaceto ensurethatthe � -treeindexesremainbalanced.

In orderto validateour approach,we appliedthis methodologyto the modelscontainedin the NationalDesignRepository(http://www.designrepository.org). The NationalDesignRepositoryis a large public col-lectionof over 55,000solid modelsfrom a varietyof real-world Computer-Aided Designdomains.First, we assessthe quality of the EigenDistancemeasurein pairwisecomparisonof solid models.Second,we evaluatethe � -treeindexeswith a samplingof rangequeriesusingreal industrialCAD data.Our implementationtestbedis basedon thePostgreSQLRDBMS.

We believe that this work is of significanceto both the databaseand engineeringcommunities:we provide anovel approachto managea uniquemediatypeandanswerqueriesof practicalimportanceto engineeringdesignandmanufacturingenterprises.Our methodsaddressa uniquespaceof databaseanddatamanagementproblemsthatarebeyondthepresentscopeof existing multimediaandspatialdatabasetechnologies.Themethodologywe presentcanbeusedasa basisfor futurework in indexing databasesof CAD andengineeringdata,andfor building databasesthathandle3D modelsasamediatype.Ourultimategoalis to enablecomprehensiveandflexible indexing of engineeringdesigndataandmeta-datato allow engineersto executeknowledge-richqueriesaboutdevice structure(e.g.,shape),behavior (e.g.,physicalpropertiesandperformance)andfunction(e.g.,designrationaleandintent).

Thispaperis organizedasfollows: Section2 providesa brief overview of therelevantbackgroundliteraturefromengineeringdesign,modeling,shaperecognition,and databasescommunities. Section3 introducesour technicalapproachto mappingsolidmodelsto graph-baseddatastructures,constructingdistancemeasuresfor thesestructures,and indexing the modelsbasedon thesemeasures.Section4 presentssomeof our experimentalresultswith theNationalDesignRepository. Lastly, Section5 givesour conclusionsanddiscussesour researchcontributionsandareasfor futurework.

2 Background and Related Work

Multimedia Databases. In thepastdecade,therehasbeenextremelyrapidgrowth in theavailability andquantityof digitizedmultimediaandaudio/visualcontent.In orderto manageandprovideaccessto this material,thedatabasecommunityhasbeendevelopingtechniquesto makeuseof thesemanticandinternalstructureinherentto populardataformats.For instance,a wide rangeof techniqueshave beendevelopedto enabletheefficient searchandretrieval of2D photo-realisticimages,real-timevideo,mapdatain GIS applications,audio,andotherconventionalmultimediadatatypes. Thesetechniquestypically exploit inherentstructureandfeaturesof the mediaaswell as the expectedaccesscharacteristicsof the model. Unfortunately, little progresshasbeenmadein developingindexing techniqueswith thesamelevel of sophisticationfor engineering3D CAD dataandsolidmodels.Oneof themostdirectly relateddisciplinesinvolvesindexing databasesof 2D images,frequentlybasedon theprojective imagesof 3D data.

In the2D shapematchingandimageretrieval literature,theindexing andshapematchingprocessusuallyfollowsoneof four commonapproaches:(1) textualquery, basedonkeywordsstoredfor eachimagein thedatabase;(2) queryby example,whichusessimilarity measuresderivedoff of asetof queryimagesprovidedasinput; (3) queryby sketch,which looks for imagesegmentsmatchingthesketchedprofile; (4) iconic queries,which usetemplatesrepresentingcritical aspectsof thedesiredimageto identify imageswith similar features.Methods(2-4) employ imageprocessingandcomputervision techniquesto identify relevantfeatures(e.g.,sunrises,people,trees,etc)in 2D images(i.e.,GIF,JPEG,etc).

Thereareseveral reasonswhy techniquesfrom computervision and2D imageretrieval do not directly apply tothe3D solid modelsandassembliesof Figure1—wewill list thecentralaspectswhich our work addresses.First,we

3

Page 4: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

��

Figure2: With origin coordinateaxesshown in thecenter, two identicalsolid modelBRepswith differing locationsandorientations.Determiningthatthesetwo modelsexactlymatchis computationallyintensiveandproneto numericerrors.

aredealingwith an explicit andexact representationof the 3D objectsin CAD databases.In 3D shape-recognitiontechniquesbasedon 2D views of objects,however, recognitionis performedusinganapproximateshapeabstractiongeneratedfrom sensoror rangedata.A significantchallengein theshaperecognitioncommunityhasbeenthedevel-opmentof highly refinedmodelrecoveryalgorithmsandmatchingtechniquesthatcompensatefor thenoisynatureofthedata.Additionally, two models and

� maybecomeindistinguishable,evenif they exhibit many dissimilarities.The approachthat we take to 3D model indexing andretrieval is basedon the assumptionthat the completemodeldescriptionis present,which makestherecognitionproblemsignificantlydifferent.

Comparingmodelsby simply comparingthe contentsof their BRepor other model representationcan leadtosignificantproblems.Transformations(scale,rotate,etc)canintroduceroundoff errorsthatexacerbatetheproblemsposedby theinexactnatureof floatingpoint arithmetic.Evensimpletranslationsor rotationsof a modelin 3D spacecanmake direct comparisoncalculationscomputationallyinfeasible—thecompleteconfigurationspaceof rotations,translations,andothertransformationsmustbesearchedto determinethat theobjectis thesameasanother. Figure2shows anexampleof a simplepartundertwo differenttransformations—itis difficult to determinethat the two partinstancesaretransformedvariationson thesamemodel.

Much like in conventionalimageretrieval andindexing, however, thereis no universallyacceptablesetof featuresonwhichmodelcomparisonscanbebased.Evaluationmeasuresfor solidmodelswill dependontheapplicationintentof theengineersusingthedatabase.For instance,processengineersmayneedto querybasedonmanufacturingprocessfeaturesbut an industrialdesignermay needto considershapeaspectsof an object. The capriciousandill-definedsemanticsof theengineeringdesignandmanufacturingdomainrequirethatwecreateamoreflexible methodologyformodelcomparisonswhich canbecustomizedto considerthecriteriaof requiredby differentendusers.It is our goalto developanarchitecturewhich will enablea wide rangeof similarity measures,whichcanbeusedwhennecessary.

Engineering Databases. In engineeringpractice,indexing of partsandpartfamilieshadbeendonewith grouptech-nologycoding[37]. Grouptechnology(GT) facilitatedprocessplanningandcell-basedmanufacturingby imposingaclassificationschemeon individualmachinedparts.GT codesspecifyclassesusingalphanumericstrings.Thesetech-niquesweredevelopedprior to theadventof inexpensivecomputertechnology, hencethey arenot rigorouslydefinedandareintendedfor human,not machine,interpretation.

In order to implementGT, one must have a concisecoding schemefor describingproductsand a methodforgrouping(or classifying)similar products,suchasthepopularOpitz,DCLASS[2], andMICLASS [21, 30] schemes.In eachcasethebasicideais for thedesignerto useasetof tablesandrulesto capturecritical designandmanufacturingattributesof a part in analphanumericstring,thatis assignedto thatpart.TheGT codefor a modelcanbeusedasaneasilystoredandcomparedindex in a databasesystemin orderto performdesignretrieval, variantprocessplanning,andothermanufacturingapplications.

Although the GT approachhasbeenusedwith somesuccessduring the last 35 years,it hasseveral limitationswhicharebecomingincreasinglyimportant.Describingdesignsasshortstringscreatesacoarseclassificationscheme,which limits the kinds of real-world retrieval problemsfor which the approachcanbe useful. Moreover, sinceGT

4

Page 5: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

codingwasintendedto be humaninterpretable,the typical encodingsdescribesomewhatsubjective humanimpres-sionsof 2D drawings.This cancausedifficulty theautomationof thegenerationof GT codes[35, 18, 39], especiallyif the resultingcodesmustremainhumaninterpretable.GT codingschemeshave beenusedprimarily for the clas-sificationandretrieval of mechanicalparts,althoughour researchgrouphasdonesomework on extendingthemtoelectro-mechanicaldevicesaswell [5, 6, 22].

At one level, we seeour researchas augmentingtraditional group technologycoding schemesby providing acompletelydigital processfor storageandcomparisonof solid models. It shouldbe noted,however, that at otherlevelstheapproachwe advocateis not limited to categorizationof solid modelsof machinedparts.By usinga graph-basedrepresentationastheprimary indexing technique,thedatabaseindexing techniqueswe presentcanbeappliedto any domainwhich canberepresentedasanundirectedgraph.As a resultof therepresentationalpower of graphs,we believe thatourapproachwill havewideapplicability.

Computationaltools for managingengineeringdatabaseshave beenanareaof active studyfor many years.De-liveringsmarterpart databaseswill requireaugmentingdesignswith designhistoriesandfunctionality information,behavioral features,andothersemantics.Developingalgorithmsto intelligently act on suchdatabasesandretrievepartsbasedon similarity requiresaccessto realistic exampledata. Towardsthis end, Will et al. are pursuinganontology-basedapproachto managingengineeringcatalogs[25]. At presentthis work is not tightly coupledwithgeometricdata,suchasis foundin a CAD system,or with representationof tolerancesandfeatures.In thedomainofdatabasesupportfor civil engineeringandarchitecture-engineering-construction(AEC), Eastmanetal. [9, 12, 11, 10]havebeendevelopingdatastructuresto link designentities(suchaswindows,doors,stairs)with semanticinformationin orderto managedesignconstraintsamongmultiple userswho operateon aprojectsimultaneously.

Hardwicket al. [16] have mergeddatabasesdescribingSTEP-basedCAD datawith theCommonObjectRequestBroker Architecture(CORBA) in order to provide CAD servicesover a network. The focus of this work, how-ever, focusesprimarily on thesoftware-engineeringproblemof providing accessto functionalinformationdescribingpartsandmodels,ratherthanon thesemanticsearchandretrieval of models.A morecompletesurvey on geometricdatabasesin generalcanbefoundin [24].

Shape Matching. A considerableamountof work hasbeendonein thefieldof computervisionin ShapeRecognitionof solid objects.Typically, theapproacheshereassumethatthereis somesensedinput of of theshapeor objectbeingrecognized,anda databaseof representationsof solidmodels.GeometricHashing[27] is a techniquefrequentlyusedin shaperecognition,which comparesa simple representationof the structureof the shapebeingexaminedto therepresentationsof a setof modelsavailablein a database.For example,somehashconstructedfrom a 2D imagemaybeusedto index adatabaseof othersuchhashesof picturesof people’sfaces.If two hashesaresimilarenough,it maybedeterminedthatthequerypictureis oneof thepeoplein thedatabase.

A commonapproachto computingmodel similarity in the field of computervision naturally involvesmakinguseof a projectionof the solid model into a 2-dimensionalspace(suchasa photograph),wherethe input signal iscomparedto theprojectionsof modelobjectsin a database.Giventhatmostinformationaboutthe3-dimensional(orhigherdimensional)propertiesof themodelarethrown away in theprojectionphase,techniquesthatmake moreuseof thesystemhavebeendeveloped.

Jain et al. [15] have developeda methodologyin order to index multimediaimagedatabasesby using featurevectorscontaininginformationsuchasthecolor, density, or intensitypatternsfoundin theimage.They haveextendedthiswork to handle3D CAD datato createsetsof 3D featurevectors.However, thefeaturesthey storearestill simplyimage-relatedpropertiesof the3D model,anddon’t capturethestructuralpropertiesof themodels.

Aspectgraphsarea commonmechanismfor dealingwith modelswhosegeometryandconfigurationarecom-pletelyknown. Thisapproachcomputesall possibleappearancesthatanobjectmaytakewhenviewedby anobserver(commonlyusinga projectionfrom 3D to 2D space,for instance).Theseviews arestoredin a graphwhoseedgesassociateviewswhena simplespatialtransformationmaydirectly transformoneview into another.

Aspectgraphs,however, aredifficult to computefor arbitrarilysolidmodels,andaretypically restrictedto objectswhoseboundariescanbedescribedby simplepolygonalstructuresor surfacessurfacesof revolution. Practicalmodels,however, aremuch more complicated,however, having combinationsof smooth,and polygonalsurfaces. Knowntechniquesfor dealingwith thesemorecomplicatedsystemsarecomputationallyintractable[32].

5

Page 6: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

Techniques Based on Geometric Properties Another possiblebasisfor classifyingdesignsis to usegeometricpropertiesof CAD models. However, progressto datein this areahasbeenratherlimited. Thereis an appealinganalogybetweenthe constructive solid geometry(CSG)primitivesusedin somegeometricmodelingsystemsandthematerialvolumesthatmachiningoperationscanremove. However, therearesubstantialdifficultiesin generatingauniqueCSGtreefor adesignwhoseprimitivesactuallycorrespondto theoperationsthatwouldbeusedto manufacturethedesign[28]; andasfar aswe know, no CSG-basedsimilarity measuresexist.

Wysk et al. [40] have describeda similarity measurefor solidsbasedon propertiesof their boundaryrepresenta-tions.Theapproachattemptsto matchthefacesof two solidssothatmatchingfaceshavesimilar areaandorientationandthencalculatesa similarity measurebetween0 and1. Thoughinteresting,this new measureof ”relaxed” geo-metricalsimilarity hasseveraldifficultiesasa designclassificationscheme.Themethodworksonly with polyhedralobjectsandthesimilarity measureis not symmetric(thesimilarity betweensolid models and � is not equalto thesimilarity between� and ). This resultsin undesirableresultswhensucha measureis usedin practicalapplicationsinvolving databaseindexing. Additionally, themethoddoesnot incorporate(or reflect)manufacturingconsiderations,suchasapproachability, fixturing, andoperationinterference,andwe do not seeany obviousway to addthem.

BO

DY S�

HELL FAC�

E C�

O

EDG�

E VERTEX

L�

U�

M�

P�

S�

U�

B�

S�

H�

E�

L�

L�

L�

O

O

P�

E�

D�

G�

E�

W�

IR�

E� C

�U�

RVE S�

U�

RFAC�

E

P�

O

IN�

T�

P�

C�

U�

R�

V�

E�

T�

R�

A�

N�

S�

F�

O

R�

M�

E�

L�

L�

IP�

S�

E�

S�

P�

H�

E�

R�

E�

C�

O

N�

E�

S�

P�

L�

IN�

E�

S�

T�

R�

A�

IG�

H�

T�

IN�

T�

C�

U�

R�

V�

E�

P�

L�

A�

N�

E�

T�

O

R�

U�

S�

ENTITY

Topolog y! Geomet"ry!

Figure3: Thedistinctionbetweengeometricandtopologicalinformationwithin BRepusedby theACIS Solid Mod-eler[38].

Solid Modeling, Computer-Aided Design and Spatial Databases. Therearethreebroadclassesof schemesforrepresentationof solidmodels[29, 33]:

1. Decompositionapproachesthatmodela solid ascollectionsof primitive objectsconnectedin someway. Ex-amplesof thesedatastructuresincludequad-treesandoct-trees[34], which representspaceascollectionsofprimitive cells (usuallycuboidal). Theserepresentationshave found greatusein vision andspatialdatabaseapplications(suchasGIS).Their inexactnature,however, hasmadethemaderivativedatastructurefor CAD—finding usein engineeringanalysis,collision detection,finite elementmethods,etc. In thesecontexts, thedecompositionis generatedoff of CAD datastoredwith oneof thefollowing two approaches.

2. Constructive approachesthat model a solid as a combinationof primitive solid templates. For example,acommonapproachis Constructive Solid Geometry (CSG),whichrepresentsasolidasabooleanexpressiononsomesetof primitivesolids.In aCSGtree,theleavesof thetreecontainprimitivesolids(usuallyblocks,prisms,cylinders,etc.)andthetree’snodescontainoperatorson theprimitives(in this examplebooleansubtraction).

3. Boundary-basedapproachesthatmodela solidusingadatastructurethatrepresentsthegeometryandtopologyof its boundingfaces.In recentyearstheboundary-representation(BRep) approachhasemergedasthedomi-nantrepresentationschemein solidmodelingandCAD systems.This is duein largepartto therepresentationalpowerandflexibility of boundarymodels,aswell asto recentadvancesin numericcomputationthatovercameearlierproblemswith modelsbecomingunstableandinconsistent.Themodelsfrom Figure1 areall examplesof BRep-basedCAD data.

6

Page 7: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

A BRepsolidmodelcreatesa uniqueandunambiguousrepresentationof theexactshapeof anartifact.While theexact implementationof theBRepwill vary amongimplementationsandsystems,BRepshave becomethedominantrepresentationschemafor modernCAD systemsusedin mechanicaldesign. BRepsareusedextensively for engi-neeringanalysis,simulation,collision detection,animationandmanufacturingplanning.Dataformatsusedby otherCAD industries,suchas the Architecture/Engineering/Construction(AEC) market, areoften surfaceor wire-framemodels—similarto a solid modelBRep,but not requiredto storea topologicallyboundedsolid entity. Further, in thecaseof awire-framemodel,only theedgesandtheir connectivity arestored.

In particular, a BRepusuallyconsistsof a graphicalstructurethatmodelsanentity’s topology. Theconnectionsbetweenthe nodesin the BRepgraphrepresentthe connectionsbetweenthe topologicalcomponentsof the entity’sboundary. Thesetopologynodesthencontainpointersto their underlyinggeometricentities;for example,a face ofa solid is a topologicalentity (representedasa collectionof boundingedges)andit hasassociatedwith it a surface(representedasan equation).An illustrationof thedistinctionbetweengeometricandtopologicalinformationfromtheACIS SolidModelingKernelis givenin Figure3.

Oneof the mostpopularof theseBRepstructuresis thewinged-edge representation[20] andits variations.Formoreinformationon boundaryrepresentationdatastructures,interestedreadersarereferredto [20, 29, 41, 42].

BRepsandotherCAD representationsaredistinctly differentfrom shapemodelsdevelopedby thecomputervi-sioncommunityin severalimportantways.Vision-basedrepresentations,suchassuperquadricsanddeformableshapemodels,arenot designedfor exactmodelingof shapes—ratherthey areemployedto reconstructshapesbasedon ap-proximatedatataken from sensorsandcameras.Theseapproximatedshapescanbe usedfor a basisof comparisonamong3D objects,howeversuchcomparisonsarelimited to analysisof geometricmomentsandgrossshapeproper-ties. Hence,they arenot directly suitablefor usein answeringthekindsof questionsthatdesignandmanufacturingengineerswish to poseaboutCAD models.

3 Technical Approach

Solid Model Model Signature Graph

Metric Space Database Indexing

Figure4: Overview of our approachto modelindexing: A solid modelis transformedinto a Model SignatureGraph,overwhichmetricspaceis constructedfor databaseindexing.

3.1 Problem Formulation

Our goal is to provide a meansthroughwhich a largedatabaseof solid modelscanbeautomaticallymaintainedandqueried,in asemanticallymeaningfulway. Theapproach,depictedin Figure4,consistsof first constructingamappingfrom solid modelsto graph-baseddatastructurescalledModel Signature Graphs. We developa setof distance(orsimilarity) functionsbetweenthesegraphsin orderto measurethe structuralsimilarity betweenthe models. Thesedistancefunctionsdefinemetric spacesor high-dimensionalvectorspacesover the setof solid models. Finally, weusespatialindexing techniquesfor metricspacesto makerangequeriesandnearest-neighborretrieval of solidmodelsmoreefficient.

7

Page 8: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

3.2 Model Signature Graphs

We make useof a specializedgraphstructurein orderto representsolid modelsin orderto performsimilarity com-parisons.Thisstructure,calledaModelSignatureGraph(MSG), is constructedfrom theboundaryrepresentationof asolidmodelin amannersimilar to thatby Wysketal. [40] andtheAttributedAdjacency Graph(AAG) structuresusedto performFeatureRecognitionfrom solidmodels[23].

Theboundaryrepresentation(BRep)[20] essentiallyconsistsof a setof edgesanda setof facesusedby thesolidmodelerto describethe shapeof the modelin 3D space.The MSG for a solid modelis definedasa labeledgraph,#%$'&)(+*-,/.

. Eachface, 0 in the BRepis representedby a vertex 1 in(

. Thevertex is labeledwith attributesthatdescribethepropertiesof thefacewithin thesolid model. Theattributesthatwe currentlykeeptrackof in theMSGinclude:

1. topologicalidentifierfor theface 0 (planar, conical,etc.);

2. underlyinggeometricrepresentationof thesurface,i.e. thetypeof functiondescribingthesurface;

3. surfacearea,032-46572 , for theface;

4. setof surfacenormalsor aspectsfor 0 .

Edgesin theMSGcorrespondto thesolidmodeledgesthatconnectfacesin themodelBRep.An edge891;: * 1=<?>A@ ,existswheneverthefaces0B: and 0=< , thatcorrespondto vertices1;: and1=< respectively, bothshareaBRepedgeC . MSGedgesarelabeledwith anumberof features,including:

1. topologicalidentifierfor theedgeC in themodel;

2. concavity/convexity of C ;3. underlyinggeometricrepresentationof thecurveof C , i.e. thetypeof functiondescribingthecurve;

4. thelengthof thecurveof C .Theasymptoticcomplexity of thetransformationswe apply from solid modelsto MSGsis linearwith respectto

thenumberof facesandedgesin theBRepof thesolid model. In practice,thetranslationis donein momentson ourhardware(SunUltra 60workstations),andwewill ignorethis costin furtheranalysis.

(a)Solid Model (b) ModelSignatureGraph

Figure5: Thespinner3modelandits transformationinto a ModelSignatureGraph.

8

Page 9: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

(a)HiddenLinesRemoved (b) ModelSignatureGraph

Figure6: Representationsof thecrankcse(651KB)modelasbotha solidmodelandMSG.

MSG Example. Figure5 illustratesthetransformationfrom theBRepof asimplesolidmodelintoaModelSignatureGraph.Figure6 depictsthetransformationfrom theBRepof acrankcaseto aModelSignatureGraph.Thecomplexityof therelationshipsbetweenthefacesandedgesin thismodel’sBRepgivesagoodideaof thedifficulty in performingshapesimilarity basedon BRep data. Effective practicalalgorithmsin the researchliteraturefor determiningthesimilarity of adjacency structuresof this form arescarce.MSGsextracttheessentialtopologicaldatain theBRepforusein comparison,but theresultinggraphis still exceedinglycomplex. Makinguseof MSGsfor similarity assessmentprovesto beanextremelychallengingproblemto solve.

3.3 Metric Spaces of Solid Models

Oncethethesolid modelshave beentransformedinto MSGs,we constructa metricspaceover thedataset.A metricspace is acollectionof objectsalongwith adistancefunction, D &6EF*-GH. , knownasthemetric, whichcomputesadistancebetweenany two elementsin theset.Thedistancefunction D &6EI*JGH. mustsatisfya few conditions:D &9EF*JGH.+$LKNMPO EQ$RG S

IdentityD &9EF*-GT.VUWK SPositivityD &9EF*JGH.+$ D &6GX*JEX.YSSymmetryD &6EF*-GH.IZ D &6GX*-[\.]U D &9EF*-[\.^STriangleEquality

We attemptto constructa distancemeasurethatoperatesover thespaceof MSGsthatexhibits theseproperties.Itis possible,andin factlikely, thatmorethanonesuchdistancefunctionexists,leadingto anumberof differentmetricspacesthatcanbeconstructedover thesamedatabaseof models.We believe thatin practicalsettings,morethanonedistancefunctionwill bedirectly applicableto any givendataset,dependingon theparticularquestionsbeingsoughtby theuserat agiventime.

Theconstraintsprovidedby thedistancemetricprovide a significantamountof structurethatcanbeexploitedina collectionof objectsfor the purposeof organization.Recently, a large amountof researchhasbeenput forth tostoredatain metric spacesusingvarioustree-structures[4, 8, 14] for efficient searchandretrieval. Theseindexingtechniquesaretailoredfor efficienthandlingof spatialqueriessuchasnearest-neighborsearchesin high-dimensionalvectorspaces,or for datasetswhich do not conformto a vectorspace.

In addition,clusteringandknowledge-discovery techniquessuchas _ -meansor _ -medianregularly make useofuseof metricdistancefunctionsto organizeandexaminestatisticaldistributionsof data.Metric or near-metricdistance

9

Page 10: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

functionsfor solid modelshave thepotentialfor usein awide rangeof applications.We believe thata numberof metricdistancefunctionscanbeusedto performsimilarity assessmentof theshapes

of solid models. Thesefunctionsconstitutea “plug-in” module that can be tailored to perform a wide rangeofsimilarity measuresbasedon thenatureof theapplication.Primarily, wecurrentlyfocuson similarity measuresbasedon topologicalandstructuralpropertiesof solidmodels.

In order to make useof distancemetricsin a practicalsystem,they must be relatively easyto compute. Thealgorithmiccomplexity in computingdistancemetricsfor arbitrarily-formedgraphstructuresis unfortunatelyanopenquestion.Thecomputationof a distancemetricbetweentwo graphdatastructuresis Turing-reducibleto thegraph-isomorphismproblem.Thatis, to decidewhethertwo graphsareidenticalundersomepermutationof thevertices.Twoisomorphicgraphshave a metric distanceof zero,andvice-versa.The asymptoticcomputingtime for an algorithmto computea graph-baseddistancemetric will thereforebe relatedto that for an algorithm for computinggraphisomorphism.

Graphisomorphismis aproblemthathasbeenstudiedfor decadesdueto its widerangeof applicationsin computerscienceandotherfieldsof research.Despitethe largeamountof energy thathasbeenexpendedon theproblem,noalgorithmshave beendevelopedthathave reducedworst-caserunningtime below exponential.Surprisingly, it is stillunknown whethertheproblemis evenNP-hard,or whetherpolynomially-boundedalgorithmsarepossible.

As a resultof the difficulties involved in understandingandconstructingefficient solutionsto the graphisomor-phism/graphdistancemetricproblemdictatesthatpracticalapproximationalgorithmsmustbeusedto index a graph-baseddatastructure.Weinvestigatethreeapproximatedistancemetricsoveracollectionof MSGs:

( C?`bacC Eedgf7h acikjmlnC ,o;p (qdrfch acikjmlnC , and,qfts Cuj drfch acikjmlnC . Thesedistancemeasureswereconstructedin orderto approximatethebehavior

of a properdistancemetricover thespaceof graphsthatwe planto index. While they areonly approximatedistancemetrics,they appearto performgenerallywell. While it maybeeasyto constructexampleswherethedistancesbe-tweentwo objectsis zero,but theobjectsarenot actuallyidentical,we have not foundexamplesof practicalmodelswheretheothermetricconstraintsfail to hold.

3.3.1 VertexDistance

Thefirst of ourdistancemeasuresis extremelysimplistic.The( C?`bacC Eedgf7h acikjmlnC betweentwo solidmodelsis defined

asthedifferencein thenumberof facesof eachmodel.Thiscorrespondsto thedifferencein thenumberof verticesinthemodels’MSGs. Intuitively, themeasuregivesa roughestimateof thedifferencein complexity of two models—modelswith morefaceswill tendto besomewhatmorecomplex thanthosewith fewer surfaces.Thecomplexity incomputingthis distancemeasureis trivial; it is linear in the numberof verticesin the graph,or equivalently, in thenumberof facesin themodelBRep.

Given that( Cu`bacC Eedrfch acikjmlnC provideslittle or no structuralinformationaboutthe edge-structureof the model,

we suspectthat this measurewill performpoorly in providing humanuserswith satisfyingmeasuresof similarity.We intendthatour otherdistancemeasures

o;p (qdgf7h acikjmlnC , and,qfts Cuj drfch acikjmlnC , to besuperiorin termsof meeting

humanexpectations,aswell asefficiently indexing thespace.

3.3.2 ITVDistance( Cu`bacC Eedrfch acikjmlnC makesuseof a particularinvariantpropertyof graphs,thenumberof vertices.No matterhow thegraph’s verticesarepermuted,the numberof verticesremainsthe same.The

o;p (qdgf7h acikjmlnC distancemeasurewasconstructedto furtherdeveloptheuseof easilycomputablegraphinvariantsin orderto characterizethegraph.In turn,they characterizethenatureof thesolidmodelfrom which thegraphwasconstructed.

To computetheo;p (qdrfch acikjmlnC , weconstructavectorthatconsistsof anumberof graphinvariants.Theinvariants

thatweconcentrateonaredirectly relatedto thedegreesequenceof agraph.While therearestrongergraphinvariantsthanthedegreesequence,it remainseasilycomputable,anddoesprovide generalstructuralinformationpertinentinthegraph.We constructtheelementsof theseverticessuchthat they arenormalizedfor scalesuchasthesizeof theverticesor thenumberof edges,andareindependentof oneanother, to enableeasycomparison.

Thecurrentsetof invariantsthatwe makeuseof areasfollows:

10

Page 11: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

1. Vertex and Edge Count: Trivially computablegraphinvariants.Unfortunately, they revealpracticallyno in-formationaboutthegraphstructure,andmaybeweightedtoostronglyin thecomparisonsweperform,skewingoursimilarity metrics.

2. Minimum and Maximum Degree: Provideameasurementof themaximumandminimumnumberof adjacen-ciesbetweenonefaceandtheotherfacesin themodel.

3. Median and Mode Degree: Providesa measurementthatrepresentsstatisticallythemostcommonand“aver-age”cardinalitiesof adjacenciesbetweenfacesin a model.

4. Diameter: Thelongestpathof edge/facecrossingsbetweenany two facesin themodel.

5. Type Histogram: EachmodelBRepcontainsa setof facesandedgeswhich have varioustypesandfeatures.Theanalysisandstatisticalbreakdown of thesecomponenttypesis usedto furthergeneratea representationofthemodel.

We will refer to a vectorthatcontainstheseelementsasan Invariant Topology Vector(ITV). At present,we tracka total of 29 continuous(floating-point)anddiscretestatisticsin the ITVs. As a result,the instanceof an ITV givesa positionof its correspondingsolid modelin a 29-dimensionalvectorspace.We evaluatedistancebetweenmodelsas the distanceof their correspondingITVs in this spaceusingstandardEuclideanmetrics. The sizeof this spaceindicatesthat traditionaldatabaseindexing techniques,suchasR-Trees,would be extremelyinefficient in indexingthespace.

Thecomputationalcomplexity of theo;p (qdgf7h acikjmlnC measureis dominatedby thecalculationof thediameterof

the graph,which canbe donein time cubic to the numberof facesin the BRep. We have beeninvestigatingthepossibilityof removing theaddedcostof computingthediameterof thegraphexplicitly. Instead,we maybeabletomakeuseof anupperboundderivedfrom theeigenvaluespectrumof theLaplacianof agraph(SeeSection3.3.3)withj verticesandthesmallesteigenvalue v :

dgf ikwxCuacC?` &)#y.Az {|||;})~ s�& jQ��� .}9~ sP� ::��X�X� �����This may be advantageousif both the

,qfts Cuj drfch acikjmlnC distancemeasuredescribedin Section3.3.3 and theo;p (qdrfch acikjmlnC are being usedsimultaneouslyin an application. The computingtime for both the diameterandthe eigenvaluesarecubic in the numberof vertices,althoughthe eliminationoneof the computationswill providea practicalimprovementin performance.

ITV Example. Thefollowing tabledepictsthegraphinvariantattributesassociatedwith theMSG for spinner3(seeFigure5). Thetableomitsthetypehistogramdatafor this part,which is trivial for this part(thevarietyof surfacesinthis modelis extremelylimited):

Attribute ValueVertex Count 24EdgeCount 63DegreeMax 2DegreeMin 17DegreeMode 4DegreeMedian 4.00DegreeStd.Deviation 3.27GraphDiameter 3

11

Page 12: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

3.3.3 EigenDistance

We have developedanothergraphcomparisontechniquerootedin thefield of spectral graph theory. Spectralgraphtheoryis thestudyof theadjacency matricesfor graphsusingtechniquesfrom linearalgebra.In particular, theeigen-values(characteristicvalues)correspondingto theadjacency matrix areexamined,andcorrelatedto otherpropertiesandstructureof agraph.

Thesortedeigenvaluesfor theadjacency matrixof a graphis referredto asthegraphspectrum:

Spectrum$N� v : z v < z����?�Hz v����

Thegraphspectrumretainsa tremendousamountof informationrelatedto thestructureandtopologyof a graph,andis oneof thestrongestknown polynomial-timecomputablegraphinvariants.Propertiesof theeigenvaluesandtheeigenvectorsof a graphhave beenusedto developgraphpartitioningtechniquesthatminimizeedge-cuts[19], graphisomorphismtests,geometrichashingin vision recognition[36], andotherapplications.Numerousrelationsarenowknown relatingcomponentsof thespectrumto graphpropertiessuchasthegraphdiameteror volume[7].

A varietyof forms of the adjacency matricesareusedin differing areasof spectralgraphtheory. Eachof theseformsleadto differentrelationshipsbetweentheeigenvaluespectrumandgraphproperties,socaremustbetakentoensurethat theappropriateform is used.Biggs[1], for instance,baseshis work on a commonform of theadjacencymatrix:

�� &6��* 1 .�$�� � Sif�

is adjacentto 1K Sotherwise. �

Chung[7], proposestheuseof analternative“normalized”form of thegraphadjacency matrix for spectraldecom-position. This form takesinto considerationthedegreesof theverticesin thegraphwhendeterminingtheentriesinthematrix. This form, known asthegraphLaplacianhelpseliminatescalingfactorsintroducedin theeigenvaluesasa resultof unevenlydistributededgesin a graph.All of theeigenvalueslie between

K�� Kand � � K , regardlessof thesize

or numberof edgesin thegraph.In addition,theLaplacianintroducesanattractiverelationshipbetweenthecombina-torial natureof graphsandcontinuousmathematics,readersarereferredto Chung’swork for further information[7].Thedefinitionof theLaplacianis formally definedasfollows:

� � &6��* 1 .�$���   � Sif�Q$ 1 and ¡;¢¤£$�K ,�¥:¦ §©¨3§©ª S

if�

and 1 areadjacent,K Sotherwise.

« ¬­As a resultof its normalizingproperties,we primarily focuson the useof eigenvaluespectraderived from the

Laplacianmatrix of Model SignatureGraphsin orderto computedistancesbetweensolid models.TheLaplacianis asymmetric,real-valuedmatrix which impliesthatall of its eigenvaluesarepositiveandreal-valuedaswell.

Currently, wecomputeall of theeigenvaluesof theadjacency matrixusingaHouseholder- ®q¯ matrixdiagonaliza-tion algorithm,which efficiently generatestheeigenvaluesin time cubicin thenumberof verticesin thegraph.

Giventheeigenvaluespectrafor two solid models,we cancomparethemfor similarity usingconventionalvectordistancemetrics. The only caveatis that the numberof eigenvaluesgeneratedwill dependon the numberof facesin thesolid model. A systemmustbedevelopedfor comparingsetsof eigenvaluesof differentsizes.We have beeninvestigatinga numberof techniquesfor dealingwith this problem. In onecasewe simply throw away the largestelementsof the longervector, basedon the assumptionthat the smallesteigenvaluesare more importantin shapematching. This judgmentis madeon the basisthat the smallereigenvalueshave a closerelationshipwith so calledSpectralGraphPartitioningtechniquesthatmakeuseof theeigenvectorscorrespondingto thesmallesteigenvaluesforthegraph.

Throwing away the informationcontainedin theeigenvalues,however, is anundesirableprospect.We have alsobeeninvestigatingwaysto padtheshortersetof eigenvaluesto makethemlonger. Wehaveexperimentedwith padding

12

Page 13: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

thesetwith a setof constanteigenvalues,of either0.0,1.0,and2.0. Whenthesetis paddedwith excessivezeros,thelargestelementsin the longer, leadingto aninflateddistancecomputation.ConsidertheEuclideandistancebetweenthethevectors°F: and ±qik¡ & °m< . , for somepositive integersj * _ suchthat _³²�j :° : $µ´ v :�¶ : * v :�¶ < ���?� v :©¶ �\·° < $µ´ v <�¶ : * v <�¶ < ���?� v <n¶ � �X¸ ·±qi\¡ & °¥< .�$¹´ v�<�¶ : * v�<�¶ < �?��� ve<�¶ � �X¸ *ºKV���?�JK ·

TheEuclideandistanceis thesameasthatfor thatcomputedby thetruncatedeigenvaluevectortechnique,with anadditionalcomponentgovernedby thelargesteigenvaluesin thelongervector.

We have investigatedusinganotherpaddingtechnique,which limits thegrowth of the distancemeasure.In thiscase,wepadtheeigenvaluevectorswith � � K insteadof

K�� K. Giventhattheeigenvaluesareboundedfrom aboveby � � K

andthe largesteigenvaluesaremorelikely to becloserto � � K thanthey aretoKH� K

it would seemasif this techniquewouldmakemoreintuitivesense.

It shouldbenotedthatthenumberof eigenvaluesin thegraphspectrumis equalto thenumberof facesin amodel.Themodelspresentin theNationalDesignRepositoryconsistof thosewith lessthan6 faces,to thosewith nearlytwothousandfaces.Thedimensionalityof this spaceis variable,but in generalis extremelyhigh for conventionalspatialdatabaseindexing techniques.

We intendto further refinethe,yf)s Cuj drfch aci;jml�C metric to make moreuseof moreof the informationprovided

in the MSG to performdistancemeasures.We wish to partition the MSG graphinto smallersubgraphs,that havesemanticor structuralcohesiveness.After recursively partitioningthegraphs,eachof thesubgraphs’eigenvaluescanbe comparedusingEuclidean-metricsasdescribedabove. Thesedistancescanbe scaledandcomparedbasedon awide rangeof parameters,which would effectively constructa comparisonfunction that truly operatesoutsideof avectorspaceat all.

Eigenvalue Spectrum Example. Figure7 depictstheLaplacianmatrix generatedfrom theModel SignatureGraphfor the spinner3model(seeFigure5). Note that it is a real-valuedsymmetricmatrix, andall of the eigenvaluesarerealandpositive.

3.4 MSG Space Indexing

Given the distancemeasuresintroducedin Sections3.3.1,3.3.2, 3.3.3,we can investigatetechniquesto index themetricspace.To furtherunderstandtheglobalpropertiesof thesemetricspaces,we appliedthemeasuresto randompairsof graphsuniformly chosenfrom theNationalDesignRepository. Wethenanalyzedthedistributionof theresultsof thedistancecalculations.Figure8 reflectsthenatureof thespacesdefinedby the

( C?`bacC Eedgf7h acikjmlnC , o;p (qdgf7h acikjmlnC ,and

,qfts Cuj drfch acikjmlnC measures.The distribution of thesedistancesis an extremely importantaspectof the metricspace,andwill determinehow well thebestindexing techniquewill beableto organizethedata.In theworstcase,thedistribution lookslike a deltadistribution, in which D &9EF*-GT.»$N�uKe¼ _�� for someconstant_ . In this case,no informationcanbe extractedfrom a distancecomparisonexceptfor the fact that the two elementsarethe same,or they’re not,anda full searchoverall elementsmustbedoneto locatea givenobjectin a set. In thebestcase,thedistribution is auniformdistribution,andeachmeasurementwouldprovidea lot of informationaboutthespace.

Closerexaminationof Figure8 revealsthateachof thedistributionsthatwe areinvestigatingaregenerallyGaus-sian distributions. The extremely heavy spike in the

( Cu`bacC E�drfch aci;jml�C probability density function indicatesthatalmostall distancesarein therangeof 0-200on a scalethatreachesto nearly20,000in our particularsampling.Thisindicatesan extremelysmall amountof entropy in the system,andindicatesthat the benefitsof indexing the spacewill begenerallylimited [4]. The

o;p (qdgf7h acikjmlnC plot indicatesthattheadditionof othergraphinvariantscanbeusedto make theModel SignatureGraphspaceslightly easierto index effectively. Thedistribution of the

oBp (ydrfch acikjmlnC13

Page 14: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

�W${|||||||||||||||||||||½

:¿¾À�FÁ <»�FÁ :»�FÁ :]�IÁ :]�FÁ :¾À�FÁ :þÄ�FÁ :N¾Ä�FÁ :»�FÁ :]�IÁ :+�FÁ :]�IÁ :N¾Ä�IÁ :+�FÁ :]�IÁ :]�FÁ :þÄ�IÁ <¾ :¿¾ ¾À�FÁ <A�IÁ <A�FÁ <þÀ�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾�IÁ <¹¾ :¿¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾�IÁ :N¾ ¾ : ¾ ¾ ¾ ¾À�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾Ä�IÁ <A�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾�IÁ :]�FÁ <¹¾ ¾ :Å�IÁ <þ ¾À�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾�IÁ :]�FÁ <¹¾ ¾À�FÁ <Å:��FÁ <þ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾�IÁ :]�FÁ <¹¾ ¾ ¾Ä�IÁ <Æ:¿¾À�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾¾ ¾ ¾ ¾ ¾ ¾ ¾ :��FÁ <A�IÁ <þ ¾Ä�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <þ�IÁ :]�FÁ <¹¾Ä�FÁ <V�FÁ <¹¾À�FÁ <»�IÁ <Å:Å�IÁ <A�FÁ <¹¾ ¾ ¾ ¾Ä�FÁ :N¾À�FÁ :þ ¾ ¾ ¾À�FÁ <þ¾ ¾ ¾ ¾ ¾ ¾ ¾Ä�IÁ <A�FÁ <�:Å�FÁ <¹¾Ä�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <A�IÁ <Æ: ¾Ä�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ :Å�FÁ <V�FÁ <A�IÁ <»�FÁ <¹¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾�IÁ :N¾ ¾ ¾ ¾ ¾ ¾Ä�IÁ <¹¾Ä�IÁ <A�FÁ <»�IÁ <Å:��FÁ <¹¾Ä�FÁ :N¾ ¾ ¾ ¾ ¾ ¾À�FÁ <þ�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾Ä�IÁ <A�FÁ <�:Å�IÁ <þ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾Ä�IÁ <¹¾À�FÁ <Å:Å�FÁ <¹¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ :þ ¾Ä�IÁ <A�FÁ :þÄ�IÁ <Æ:Å�IÁ <A�FÁ :þ ¾ ¾ ¾À�FÁ <þ�IÁ :N¾ ¾Ä�FÁ <µ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾Ä�FÁ <Å:��FÁ <µ¾ ¾ ¾ ¾ ¾ ¾¾ ¾À�FÁ <»�FÁ <µ¾ ¾ ¾ ¾À�FÁ :þ ¾ ¾ ¾ ¾ ¾Ä�FÁ :]�IÁ <Å:Å�IÁ <»�FÁ <A�IÁ <A�FÁ <µ¾Ä�IÁ <�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <�:Æ�FÁ <µ¾Ä�FÁ <µ¾ ¾�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <A�IÁ <Æ:Å�IÁ <¹¾ ¾ ¾�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <µ¾Ç�FÁ <�:Å�FÁ <µ¾ ¾�IÁ :N¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <A�IÁ <þÄ�IÁ <Å: ¾ ¾¾ ¾ ¾ ¾ ¾ ¾ ¾Ä�IÁ <A�FÁ <µ¾ ¾ ¾Ä�FÁ <µ¾ ¾Ä�FÁ <¹¾ ¾ ¾ ¾ ¾ ¾ :Ⱦ�IÁ <¹¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾ ¾À�FÁ <µ¾ ¾ ¾ ¾ ¾ :

�����������������������É°XÊeCbl�a7` � w $ ´'KH� ËÈKH� ËÌK�� ÍÌK�� ÎÈKH� ÏÈKH� Ð � � K � � K � � K � � K � � K � � K ��������?� � � KÌK�� K � � �È� � � � � Ë � � Ë � � Ë � � Ñ � � Ñ � �ÒÍ � �ÒÍ � � Î ·

Figure7: Laplacianmatrix andeigenvaluespectrumcomputedfor thespinner3model.

measureis moreevenly distributedthanthe( Cu`bacC Eedrfch acikjmlnC measure,althoughit still hasa fairly stronglydefined

peakin thecomparisonspace.Eachof the

,qfts Cuj drfch acikjmlnC -basedmeasureshaveslightly distancedistributions.In general,themeasureappearsto be moreuniformly distributeda set thanthe othermeasures,althoughpeaksarestill clearly defined. It appearsthat the versionof the measurethat truncatesthe eigenvaluevectorsto producethe largestspikes in the measure.Whenpaddingthe distancevectors,our empiricalevidencesuggeststhat paddingthe vectorswith 0.0 leadsto themostevenlydistributeddistancemeasure,althoughthisdistribution is not significantlydifferentthanthatfoundwhenpaddingwith thevalueof 2.0. Thefact that the

,qfts C?j dgf7h acikjmlnC measuresexhibit thehighestdegreesof uniformitydrove us to pursuedatabaseindexing basedon thesemeasures.The othermeasuresmay yet prove to be useful inappliedengineeringdatabases,however, andtheir performanceshouldbeinvestigatedaswell.

Furtherrefinementsmaybedoneto thesedistancemeasuresin orderto enablethemto providemoreinformationto the user. On theotherhand,we mustbe carefulin that the distancemeasurecanbe made“too good,” leadingtodistancesthatdon’t haveameaningthatis easilyunderstoodby humanusers.

Spatial Indexes Spatialindexing techniquesaredesignedto clusterobjectscloseto oneanotherin avectoror metricspaceonnearbydatabasepages,in orderto improvethequeryprocessingtimeof rangeandnearest-neighborqueries.Conventionalspatialindexessuchas _H¡ -TreesandR-Trees,however, fail to scaleto handledatasetsthatexhibit anextremely high dimensionality[31]. The centralproblemis that the amountof spaceenclosedby the partitionedregionsof spaceby theseindexesincreasesexponentiallywith thedimensionalityof thedataset.As a result,theindexstructurestypically becomeextremelysparse,andthecostof maintainingtheindex dominatesthequerytime.

Anothercriticism of conventionalspatialindexesis thatthey requirethedatato fit in a vectorspace.Many formsof data,suchasvideo-streamsor graphsdon’t havenaturalmappingsto vectorspaces.As a result,thedatacannotbedirectlyhandledby thesespatialindexes.

Severaleffectiveindexing techniquesexist to handlelargemetricspacesandhigh-dimensionalvectorspaces.Mostof thesetechniquesstoredatain a recursive tree-like structurebasedbasedon distancescomputedbetweenobjects.Most of thesesystemsattemptto minimize the numberof distancecomputationsthat are donewhen executingarangequery, or whentrying to locatea givenobject. In general,eachof thesetechniquesachieve nearlylogarithmicimprovementsin storagespace,dependingonhow balancedthetreeis.

14

Page 15: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 200 400 600 800 1000 1200 1400 1600 1800

Pro

babi

lityÓ

Distance

’vertex-count’

(a) ÔÖÕ-שØ6ÕJÙBÚ»Û6ÜJØ6ÝnÞ\ßcÕ 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 100 200 300 400 500 600 700 800 900 1000

’itv-distance’

(b) à©á�Ô�Ú»Û6ÜJØ6ÝnÞ\ßcÕ

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 2 4 6 8 10 12 14 16 18 20

’eig-distance’

(c) â�Ûäã?ÕJÞTÚ»Û6ÜcØ6Ý�ÞkßcÕ (Truncated)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 10 20 30 40 50 60

’eig-pad-0-distance’

(d) âÖÛåã�Õ-Þ\Ú»Û6ÜJØ6ÝnÞ\ßcÕ (Paddedwith 0.0)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 5 10 15 20 25 30

’eig-pad-1-distance’

(e) âÖÛåã�Õ-ÞTÚ»ÛæÜJØ6ÝnÞ\ßcÕ (Paddedwith 1.0)

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0 5 10 15 20 25 30 35 40 45 50

’eig-pad-distance’

(f) âÖÛåã�Õ-Þ\Ú»Û6ÜJØ6ÝnÞ\ßcÕ (Paddedwith 2.0)

Figure 8: Distribution of distanceswithin the spacesdefined by the( Cu`uacC E�drfch acikjmlnC , o;p (qdgf7h acikjmlnC , and,qfts C?j dgf7h acikjmlnC measuresoverasamplingof modelsfrom theNationalDesignRepository.

15

Page 16: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

1

3 4

1 1 1 2

Figure9: A graphicaldepictionof an � -Treedatastructure.

The VP-Treeis an early approachto metric spaceindexes. Eachnodein a VP-Treecontainsa setof elements.A representative element(a “vantage-point”)is chosen,anda sphereis constructedaroundit of theappropriatesizeto divide the elementsinto two evenly sizedsets—thosewithin the sphere,and thoseoutsidethe sphere. Whilethis constructsa balancedtreestructure,the effectsof dynamicinsertionanddeletionoperationson the treecausedifficulties in maintainingthe balancednatureof the tree. Without performingcostly balancingoperations,the treesuffersfrom theproblemthatthespaceis brokeninto asymmetricpieces—theareaoutsideof eachsphereis typicallymuchlarger thantheareaenclosedwithin. Therehasbeensomework doneto improve thebehavior of this schememakinguseof multiple vantagepointobjects[3].

Anothermetric treestructureknown as the GNAT [4] makesuseof “generalizedhyperplanes”to partition thespace.At eachnodein thetree,two or morerepresentativeelementsarechosen.Theremainingelementsareput intosubtreescorrespondingto eachof the representatives,dependingon which they areclosestto. Eachnodethereforebreaksupthespaceof elementsintoDirichletdomains(Voronoidiagramsin planarspaces).Unfortunatelymaintaininga balancedGNAT canleadto poorperformance.

Giventhatwe expectthata practicaldatabaseof solidmodelsto undergonearlyconsistentchangesasmodelsareadded,removed,andupdated,werequirethatanindex structureperformwell underdynamicinsertionsanddeletions.Wehavechosento makeuseof anindexing techniqueknown astheMetric Tree( � -Tree)[8]. Thisstructurehasbeendevelopedto provideacceptableperformanceevenin thefaceof standardupdatesto thetree’sstructure.As illustratedin Figure9, eachnodein the � -Treeconsistsof a modeldataelementC , anda range ` in a tuple

& C * ` . . The treemaintainsthepropertythatevery element

�C storedunderthe node jm¡ satisfies¡ f7h acikjmlnC & �C * jm¡ � C ./z jm¡ � ` . Heuristicbalancingtechniquescanbeappliedto ensurethatthetreedoesnot grow too deepin practicalapplications.

We implementour � -Treeindexing structurein the freely availablePostgreSQLRDBMS. Our implementationmakesuseof theGeneralizedSearchTrees(GiST) [17] indexing API providedin PostgreSQL.TheGiSTarchitectureconsistsof ageneralizedapproachto databaseindexing, in which new indexesareconstructedby specializingasetofoperators:

1. Consistent(element,tree)—Returnfalseif element cannotbefoundundertree.

2. Union(elements[])—Constructandreturnapredicatethatcanbeusedto describea setof elements.

3. Compress(element)—Returna representationof anelementin theindex (maybelossy).

4. Uncompress(element)—Givena compressedrepresentation,reconstructthedatabaseelement.

5. Penalty(element,tree)—Returnacostof how badit wouldbeto insertelement underthetreetree.

6. PickSplit(elements[])—Split a setof elementsinto two sets.

7. Same(pred1,pred2)—Returntrueif thetwo predicatesarethesame,falseotherwise.

16

Page 17: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

The representationthat we usein our implementationis conceptuallysimilar to that doneby Ciacciaet al. [8].Querypredicatesareof theform:

�k&)#P* ` .�¼ `P@xéAê . � , andconsistof amodelgraph#

anda positive-valuedrange .Unlike many approachesto storinggraphsin databases,we choseto implementgraphsasdatabase“BLOBS”—

opaquedatastructuresfor which thedatabasesystemhasspecialoperatorsto compareandinspect,insteadof storingthe graphsasrelationsin the database.We believe that the useof distancefunctionsbetweengraphswill provideenoughinformationto enablethedatabasesystemto adequatelymanagethegraphs.To this end,we haveconstructedasystemthroughwhichPostgreSQLcanhandlegraphsstoredin a populargraphfile format(LEDA [26]) asfirst-ratedatabaseobjectssuchasintegersor strings.

TheConsistent(#,tree) operatorreturnswhetherthe distancebetweenthegraph

#andthe graphin the

predicatedescribingthechildrenof thetree is lessthantherangespecifiedin thepredicate.TheUnion(elements[]) operatorconstructsapredicateto describethesetof elementsby heuristicallychoos-

ing anelementfrom theset(it maybeeitherapredicate,or agraphelementitself), promotesit to aroutingobject.Therangeis computedto belargeenoughto encloseall of theelementsin theset.This involvescomputingthemaximumof the distancesof the routingobjectto the graphsin the setandof the sumof the distancesto the graphssummedwith therangescontainedwithin thepredicates.Theheuristicsthatwemakeuseof arediscussedlater.

TheCompress(element) operatorleavespredicatesunchanged,andconvertsleafgraphsinto predicatesthem-selveswith zeroranges.TheDecompress(element) operatordoesnothingto theelement.

Wehavebeenexperimentingwith differentPenalty(element,tree) operators.Ourcurrentimplementationcomputesthe distancebetweenthe graphin the element,and the graphin the predicatedescribingthe tree. Theheuristicis thendefinedto be thedistancewhenthedistanceis lessthantherangein thepredicate,andthedistancesummedwith thedistancefrom theelementgraphto theedgeof therangeof thepredicate.

ThePickSplit(elements[]) operatoris trivially defined.It merelydividesthesetof elementsinto two sets,without any extensive computation.In thefuture,morework mustbespenton decidingoptimalsplitting techniquesto ensurea moreefficient tree.TheSame(pred1,pred2) operatoris definedwhenthegraphandtherangein thepredicateareidentical.Giventhatour implementationusesreferencesto graphobjectsin theindex, we compareonlyto seeif thereferencesarethesame,andneverperforma full graphisomorphismtestbetweenthegraphs,makingthistestconstanttime.

Oneof the centraldifferencesarethatwhile the currentavailableimplementationis only for the GiST databasesimulationpackage,ratherthana practicaldatabasefor engineeringwork. ThedifferencesbetweentheGiST API’sprovidedin eachof thesepackagesaresomewhatsignificant,andtheformulationof the � -Treeindex mustbedoneslightly differently. Our formulationof the � -Tree in the PostgreSQLGiST environmentmakes the reuseof thedistancecalculationsslightly moredifficult thanit is in the implementationprovidedby Ciaccia.Our currentimple-mentationsimply recalculatesdistancesasneeded,without reusingpreviousdistanceresults.Properreuseof distancecalculationswouldleadto asignificantsavingsin computingtime,especiallywhenthetreecontainslargeandcomplexmodels.

In addition,our implementationmakesuseof a numberof heuristicsfor eigenvaluespectrum-basedindexing toimproveits performance,primarily in theUnion GiST operator.

First, in order to comparetwo graphs# : and

# < , the eigenvaluescorrespondingto the graphsmust first becomputed,andcompared.If the eigenvaluecomputationis beingdoneon the fly whencomparisonsareneeded(asthey are,in our implementation),computingtime on theorderof ë &º¼ (³´ # : · ¼ ìAZ¼ (¤´ # < · ¼ ìu. is required,assumingthat¼ (g´ # · ¼ is the numberof verticesin the graph. In addition,regardlessasto whetherthe computationis doneon- oroff-line, ë & w f j &-¼ (¤´ # :º· ¼í*b¼ (¤´ # <�· ¼ .-. computingtime is requiredto performtheeigenvalue-vectorcomparison.Eachnodein the � -Treecontainsa representative “routing node,” andevery searchpassingthroughthis nodewill needto have its distancefrom theroutingnodecomputed.As a result,we intelligently selectroutingobjectssuchthat thenumberof verticesin theroutinggraphis minimized.Althoughwe currentlydo not precomputeandstoreeigenvaluecomputationsfor thegraphs,wehavefoundthateigenvaluecomputationsbegin to dominatetheindexing andretrievaltimeswhenthegraphsaresignificantlylarge.Wearecurrentlyinvestigatingameansto precomputegrapheigenvaluesandstorethemin theindex.

A secondheuristicthatwemakeuseof is donewith respectto thesizeof therangesin thepredicates.Wemaketheassumptionthatpredicateswith excessively largerangesmakegoodcandidatesto insertobjectsbelow, asthey will bemorelikely alreadybelargeenoughin orderto containthenew objectandobjectsinsertedunderit. After first sorting

17

Page 18: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

thecandidatesfor promotionto routingobjectsin theUnion operation,we thensortbasedon predicaterange.

4 Experimental Results

We validateour approachby usingit to index the setof solid modelsthatmake up the NationalDesignRepository.Thisdatabaseconsistsof asetof morethan55,000solidmodelstakenfrom practicalengineeringCAD environments.Thedatabaseis hand-indexedby parameterssuchasthevendorthatpublishedthemodels,themodelformat,andotherexternalattributes.

Our experimentationconsistedof first precomputingthe Model Signature Graphsfor eachof the modelsin thecollection. This stepis conceptuallyrelatively straightforward,unfortunatelythe translationprocessis a inaccuratetransformation.A numberof modelsin thedatabasehaveerrorsandabberationsthatleadto inappropriateModelSig-natureGraphsbeinggenerated.Fortunately, thesecasesarerelatively easyto detect,andthey arerelatively infrequent,makingthemeasyto eliminatefrom the computations.Thesegraphsaregeneratedin a LEDA file format, andareassociatedwith thesolidmodelthatthey weregeneratedfrom.

Thesecondstepin theexperimentationphaseconsistsof loadingthesegraphfiles,andthemodelfilesthatgeneratethegraphsinto thePostgreSQLdatabase,andtheconstructionof the � -Treeindex over thedataset.We evaluatetheconstructionof the index by maintaininga countof the numberof distancecalculationsthat areperformedwhilethe index is built. This is an extremely accuratemeasureof the index constructionperformance,as the distancecomputationswe performaremorecostlyboth in asymptoticandpracticalterms.It is not unusualfor aneigenvaluedecompositionof a graphfor the distancecomputationto requireseveral minutesto compute. In order to furthercompareour work with that of the � -Tree work doneby Ciacciaet al., we also computethe numberof uniquedistancecomputationsthatweperform.Thishelpsto normalizeour resultsto reflecton how importantthecachingofpreviouslycomputeddistancecomputationsis to performance.

Figure10, part (a) depictsthe numberof distancecomputationsperformedby our indexing softwareasthe sizeof thedatabasegrows. The line for build-index-log representsthenumberof computationsperformedby oursystem,whereastheline build-index-uniq-log representsthenumberof uniquedistancecalculationsdone.Itshouldbenotedthatwhile thenumberof computationsthatareredonebecomesincreasinglymoresignificantasthesizeof thedatabasegrows,bothplotsappearto exhibit asymptoticallysimilarbehavior.

We alsoevaluateour systemby performinganumberof rangequeriesover thedatabaseafterit hasbeenindexed.We have foundthat thecostof an individual rangequerycanvary fairly wildly dependingon therangeof thequery,aswell asthegraphusedasa basisfor thequery, sowe examinetheaverageperformanceof a query. Figure10,part(b) depictsthe numberof distancecomputationsperformedwhile executinga rangequeryin the database.Withoutan index, the queryexecutormustexamineeachmodelin the database,andperforma distancecalculationbetweenit andthequerygraph.Thenumberof computationsis illustratedby theplot of ’range-query-raw-log’ in thefigure,which naturallygrows linearly with the numberof modelsin the database.’range-query-log’indicatesthe numberof distancecalculationsdoneby our softwarewith the � -Treeindex enabled.It is interestingto notethat the costof the querydoesn’t seemto be decreaseduntil the numberof modelsin the databasereachesmorethan �=î (512).After thispoint, thenumberof distancecalculationswith theindex is consistentlyseveralhundredfewer thanwithout.Anotherimportantpoint to noteis that the numberof recomputeddistancesduring the rangequeryexecutionphaseis extremelysmall. As a result,theperformancebenefitsof cachingdistancecalculationsseemsmuchlessbeneficialduringruntime—mostof theperformancebenefitwill behadduring theconstructionof thedatabase.As a result,ifthedatabaseis generallystatic,this maynot beanecessaryoptimization.

5 Conclusions

Contributions. This paperpresentedour approachandpreliminaryexperimentalresultstowarda generalindexingschemefor solid modelsof mechanicaldesigns.We showedhow to mapCAD datato ModelSignatureGraphs, cre-atingModelComparisonSpacesin whichwecancomputemetricdistancesamongCAD models.ThisEigenDistancemeasureenabledusto build aspatialindex usingan � -treedatastructureanddevelopheuristicsthatexploit propertiesof themodelcomparisonspaceto ensurethatthe � -treeindexesremainbalanced.

18

Page 19: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

7 7.5 8 8.5 9 9.5 10 10.5 11

Num

ber

of D

ista

nce

Com

puta

tions

ï

2^k Models in Database

’build-index-log’’build-index-uniq-log’

(a) Building theIndex

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

7 7.5 8 8.5 9 9.5 10 10.5 11

Num

ber

of D

ista

nce

Com

puta

tions

ï

2^k Models in Database

’range-query-log’’range-query-uniq-log’’range-query-raw-log’

(b) PerformingaRangeQuery

Figure10: Thenumberof distancecomputationsperformedduringtheindexing andqueryingphasesof solid modeldatabasesof varyingsize. Whendistancecomputationsareexpensive, they dominatethecomplexity of the indexingandretrieval operations.

19

Page 20: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

We validatedour approachwith themodelscontainedwithin theNationalDesignRepository1 andpresentresultsthatassessthequality of theEigenDistancemeasurein pairwisecomparisonof solidmodelsaswell asthe � -tree.

Thiswork representsa contributionof significanceto boththedatabaseandengineeringcommunities,addressinga uniquespaceof databaseanddatamanagementproblemsthatarebeyondthepresentscopeof existing multimediaandspatialdatabasetechnologies.Our contributionsincludea novel approachto managea uniquemediatype andanswerqueriesof practicalimportanceto engineeringdesignandmanufacturingenterprises.We seethis researchas a foundationto enablecomprehensive andflexible indexing of engineeringdesigndataandmeta-datato allowengineersto executeknowledge-richqueriesaboutdevice structure(e.g.,shape),behavior (e.g.,physicalpropertiesandperformance)andfunction(e.g.,designrationaleandintent).

Future Work. Furtherwork canbedoneonthe,yf)s Cuj drfch aci;jml�C distancemeasurein anumberof ways.TheModel

SignatureGraphscanbe recursively partitionedinto smallerconnectedcomponents,andeigenvaluespectracanbecomputedfor thesesubgraphsfor usein thecomparisonoperation.spectrafor thegeneratedsubgraphsto beusedinthe comparison.Anotherpathfor the future developmentof this methodis throughthe useof multi-scaleanalysis.Small,lesssignificantaspectsof thesolidmodelmaybeignoredat earlystagesof thecomparisonprocessin ordertoenablethecomparisonto morecloselyapproximatehumanintuition. This typeof approachis likely to handlecasesin which onemodel is very similar to another, except for a significantamountof excessive detail in the other. Webelieve thatmany of thetechniquesfoundin 2D imageretrieval canbeappliedto moreintelligently make useof thedataavailablein solidmodels.

Integrationwith feature-basedindexing is essentialto constructpracticaldatabasesof solidmodelsandCAD data.While no canonicalsetof CAD featuresexists, a numberof practicalsystemsthat detectandmake useof featureshave beencreatedanddeployed in industry. Featurerecognizersfor CAD datacantranslateBRepinformationintoa setof semanticallyrelevantfeaturesfor machiningor otherwisebuilding a model. Informationaboutthelocations,dimensions,andorientationsof featuresmay be usedto improve the performanceof our similarity measures.Byleveragingsomeearlierwork [13], we hopeto incorporateadditionalfeatureattribute informationinto our indexingandcomparisonalgorithms. In addition,informationaboutengineeringtolerances,surfacefinishes,constraintsandparametrics,etc.all canbeusedto augmentthebasictechniquespresentedhere.

We believe that the graph-basedsimilarity indexing approachthat we have presentedis applicableto other3Dmediadomains:solid free-form(SFF)fabricationtechniquesoftenuseStereolithography(STL) formatfiles; VRMLis acommonmodelingformatfor webdata;videogames,suchasDoomandQuake,employ discrete3D world models.In eachof thesecases,therearegeometry, topology, shapeandsemanticswhichcanbeusedto createdatabaseaccessmethodsthatcanstorethedatain orderto enablemeaningfulqueries.

In thispaper, wehaveprovidedamechanismthatcanbeusedto implement“querybyexample”searchandretrievalof modelsin a database.Useof this methodaloneis insufficient for fully characterizingthequeriesthatwe expecttobecommonin engineeringdomains.We canconceiveof aprocessengineerlooking for machinablepartswith similarprocessplans;a corporatemanagerlooking for shape,function,or process-definedpartfamiliesacrossproductlines;andproductionengineersposingquestionsaboutthemanufacturingprocessandassemblyconsiderationsin ordertoequipa factory. Researchis neededon how to bestformalizeandspecifyqueriesto databasesso that they may beusefulin practicalsettings.

Acknowledgements. This work wassupportedin partby NationalScienceFoundation(NSF)KnowledgeandDis-tributedIntelligencein the InformationAge (KDI) Initiative GrantCISE/IIS-9873005;CAREERAward CISE/IIS-9733545andGrantENG/DMI-9713718.

Any opinions,findings,andconclusionsor recommendationsexpressedin this materialarethoseof theauthor(s)anddo not necessarilyreflecttheviews of theNationalScienceFoundationor theothersupportinggovernmentandcorporateorganizations.

1http://www.designrepository.org

20

Page 21: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

References

[1] NormanL. Biggs. Algebraic GraphTheory. CambridgeTractsin Mathematics67.CambridgeUniversityPress,1974.

[2] A. BondandR.Jain.Theformaldefinitionandautomaticextractionof grouptechnologycodes.In ASMEDesignTechnicalConferences,Computers in EngineeringConference, pages537–542,August1988.

[3] TolgaBozkayaandZ. Meral Ozsoyoglu. Distance-basedindexing for high-dimensionalmetricspaces.In JoanPeckham,editor, SIGMOD1997,ProceedingsACMSIGMODInternationalConferenceonManagementofData,May13-15,1997,Tucson,Arizona,USA, pages357–368.ACM Press,1997.

[4] S.Brin. Nearneighborsearchin largemetricspaces.In Proceedingsof VLDB1995, pages574–584,1995.

[5] A. Candadai,J.W. Herrmann,andI. Minis. A grouptechnology-basedvariantapproachfor agilemanufacturing.In Proceedingsof the ASMEInternationalMechanical EngineeringCongressand Exposition, SanFrancisco,California,November1995.ASME Press.

[6] A. Candadai,J. W. Herrmann,and I. Minis. Applicationsof group technologyin distributedmanufacturing.Journalof IntelligentManufacturing, 7:271–291,1996.

[7] FanR. K. Chung.Spectral GraphTheory. Number92 in RegionalConferenceSeriesin Mathematics.AmericanMathematicalSociety, 1997.

[8] P. Ciaccia,M. Patella,andP. Zezula.M-tree: An efficient accessmethodfor similarity searchin metricspaces.In Proceedingsof the23rd VLDB, August1997.

[9] C. M. Eastman,A. H. Band,andS. C. Chase.A formal approachfor productmodelinformation. Research inEngineeringDesign, 2(2),1991.

[10] C. M. EastmanandT. Jeng. A databasesupportingevolutionaryproductmodeldevelopmentfor design. Au-tomationandConstruction, 1997.

[11] C. M. Eastman,D. S.Parker, andT. Jeng.Managingtheintegrity of designdatageneratedby multiple applica-tions: Theprincipleof patching.Research in EngineeringDesign, 1997.

[12] CharlesM. Eastman.Managingintegrity in designinformationflows. InternationalJournalof ComputerAidedDesign, 28(6/7):551–565,June-July1996.

[13] Alexei Elinson,DanaS. Nau, andWilliam C. Regli. Feature-basedsimilarity assessmentof solid models. InChristophHoffmanandWim Bronsvoort,editors,FourthSymposiumonSolidModelingandApplications, pages297–310,New York, NY, USA, May 14-161997.ACM, ACM Press.Atlanta,GA.

[14] A. Fu, P. Chan,Y.L. Cheung,andY.S. Moon. Dynamicvp-treeindexing for n-nearestneighborsearchgivenpair-wisedistances.VLDBJournal, 9:154–173,2000.

[15] AmarnathGuptaandRameshJain. Visual informationretrieval. Communicationsof the ACM, 40(5):71–79,May 1997.

[16] MartinHardwick,David L. Spooner, TomRando,andK. C.Morris. Sharingmanufacturinginformationin virtualenterprises.Communicationsof theACM, 39(2):46–54,February1996. Specialissueon ComputerScienceinManufacturingeditedby MichaelWozny andWilliam Regli.

[17] J. Hellerstein,J. Naughton,andA. Pfeffer. Generalizedsearchtreesfor databasesystems.In VLDB’95, Pro-ceedingsof the21stInternationalConferenceon VeryLargeDatabases, 1995.

[18] M. HendersonandS. Musti. Automatedgrouptechnologypart codingfrom a three-dimensionalcaddatabase.Journalof Engineeringfor Industry, 110(3):278–287,1988.

21

Page 22: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

[19] BruceHendricksonandRobertLeland.An improvedspectralgraphpartitioningalgorithmfor mappingparallelcomputations.SIAMJournalonScientificComputing, 16(2):452–469,1995.

[20] ChristopherM. Hoffmann.GeometricandSolidModeling:AnIntroduction. MorganKaufmannPublishers,Inc.,California,USA, 1989.

[21] A. Houtzeel. Miclass,a classificationsystembasedon grouptechnology. TechnicalReportTechnicalReportWorking PaperMS #75-721,Societyof ManufacturingEngineers(SME),1975.

[22] S. Iyer andR. Nagi. Automatedretrieval andrankingof similar partsin agilemanufacturing. IIE Transactionson DesignandManufacturing, 29(10):859–876,1997.SpecialIssueon Agile Manufacturing.

[23] S. JoshiandT. C. Chang.Graph-basedheuristicsfor recognitionof machinedfeaturesfrom a 3D solid model.Computer-AidedDesign, 20(2):58–66,March1988.

[24] Alfons KemperandMechtildWallrath.An analysisof geometricmodelingin databasesystems.ACM ComputingSurveys, 19(1):1–45,March1987.

[25] Jihie Kim, S. Ringo Ling, and PeterWill. Ontology engineeringfor active catalog. Technicalreport, TheUniversityof SouthernCalifornia,InformationSciencesInstitute,1997.

[26] Stefan NaherKurt Mehlhorn. LEDA A Platform for Combinatorialand GeometricComputing. CambridgeUniversityPress,1999.

[27] Y. LamdanandH. J.Wolfson. Geometrichashing:a generalandefficient model-basedrecognitionscheme.InSecondInternationalConferenceonComputerVision, pages238–249,1988.

[28] Y. C. LeeandK. S. Fu. Machineunderstandingof csg: Extractionandunificationof manufacturingfeatures.IEEEComputerGraphicsandApplications, pages20–32,January1987.

[29] Martti Mantyla. An Introductionto SolidModeling. ComputerSciencePress,CollegePark,MD, 1988.

[30] Organizationfor IndustrialResearch.Oir multi-m codebookandconventions.Waltham,MA, 1986.

[31] M. Otterman.Approximatematchingwith highdimensionalityR-trees.M.ScScholarly paper, Deptof ComputerScience, Univ. of Maryland,CollagePark, MD, 1992.

[32] Sylvain Petitjean. The enumerative geometryof projective algebraicsurfacesand the complexity of aspectgraphs.InternationalJournalof ComputerVision, 19(3):1–27,1996.

[33] AristidesA. G. Requicha.Representationfor rigid solids: Theory, methods,andsystems.ComputingSurveys,12(4):437–464,December1980.

[34] H. Samet.Thequadtreeandrelatedhierarchicaldatastructures.ACM ComputingSurveys, 16(3):287–260,1984.

[35] J. J. ShahandA. Bhatnagar. Grouptechnologyclassificationfrom feature-basedgeometricmodels.Manufac-turing Review, 2(3):204–213,1989.

[36] Ali Shokoufandeh,SvenJ.Dickinson,KaleemSiddiqi,andStevenW. Zucker. Indexingusingaspectralencodingof topologicalstructure.ComputerVisionandPatternRecognition, 2, 1999.

[37] C. S. Snead. Group Technology: Foundationsfor CompetitiveManufacturing. Van NostrandReinhold,NewYork, 1989.

[38] SpatialTechnologyInc.,Three-SpaceLtd., andAppliedGeometryCorp.,242555thStreet,Building A, Boulder,CO80301.ACISc

ðGeometricModelerApplicationGuide, v1.6edition,November1994.

22

Page 23: An Approach to Indexing Databases of Solid Modelsedge.cs.drexel.edu/GICL/papers/PDFs/GICL-TR-2001-B.pdfAn Approach to Indexing Databases of Solid Models ... RDBMS. We believe that

For Submissionto SIGMOD2001.

[39] A. SrikantappaandR.Crawford. Automaticpartcodingbasedoninterfeaturerelationships.In JamiShah,MarttiMantyla, and DanaNau, editors,Advancesin Feature BasedManufacturing, pages215–237.Elsevier/NorthHolland,1994.

[40] Tien-LungSun,Chuan-JunSu,RichardJ.Mayer, andRichardA. Wysk. Shapesimilarity assessmentof mechan-ical partsbasedonsolidmodels.In Rajit Gadh,editor, ASMEDesignfor ManufacturingConference, Symposiumon ComputerIntegratedConcurrentDesign, pages953–962.ASME, Boston,MA. September17-21.1995.

[41] Kevin Weiler. Edge-baseddatastructuresfor solid modelingin curved-surfaceenvironments.IEEE ComputerGraphicsandApplications, 5(1):21–40,January1985.

[42] Tony C. Woo. A combinatorialanalysisof boundarydatastructureschemata.IEEE ComputerGraphicsandApplications, pages19–27,March1985.

23